[Xapian-discuss] spliting words algorithm - indexer vs. queryparser

Richard Boulton richard at lemurconsulting.com
Fri Dec 26 19:24:30 GMT 2008


On Fri, Dec 26, 2008 at 10:10:04AM -0500, tata 668 wrote:
> 1) Being able to use the exact same algorithm to split words when adding a text to a document and 
> when parsing a query (with the queryparser).

Not quite the same algorithm (because you don't want to handle things like
"AND" and "OR" and brackets in a query the same way as in a document), but
the TermGenerator class does what you want.

http://xapian.org/docs/sourcedoc/html/classXapian_1_1TermGenerator.html

> 2) Is it possible to set the "content" (the postings) of a document by passing the whole text at 
> once, without the need to split the words by ourself and adding each word one by one? That would be 
> perfect for Xapian to use its internal words-spliting algorithm, the same that would after be used 
> by the queryparser.

That's what the term generator does for you.

-- 
Richard



More information about the Xapian-discuss mailing list