[Xapian-discuss] Grouping document paragraphs

Yannick Warnier ywarnier at beeznest.org
Mon Jan 14 22:51:10 GMT 2008


Hi again,

I'm trying to get from the docs how I should (ideally) index a document
with a reference to a physical file, but by indexing several paragraphs
of this document separately.

>From the PHP examples (which are very nice, thank you), I get that you
could/should index paragraphs separately if you want to record a notion
of different sections for this document, but what's the perfect way of
indexing one document and have it's paragraphs indexed separately?

Index all the paragraphs as separate Xapian::Document and attach the
"data" of the same file URI to each of them?

I can see one blocking problem in my case:
I want to index documents but splitted into paragraph types (title,
abstract, main content, etc) and then I want to retrieve every document
that contains a combination of "trains" AND "wheels" in any combination
of the paragraphs of a document, how could that work?

I then suppose that there is a way to group Xapian::Document's into some
kind of a higher-level document... but so far I haven't found the
related documentation talking about that.

I will publish any info on this or the two earlier questions on the
wiki.

Yannick




More information about the Xapian-discuss mailing list