[Xapian-discuss] Queryparser problem..

Olly Betts olly at survex.com
Sun Dec 9 11:42:03 GMT 2007


On Sun, Dec 09, 2007 at 08:16:17AM +0100, Jesper Krogh wrote:
> The queryparser in my setup is using strategy STEM_SOME which seem to 
> give the best handling of the data in our setup.
> 
> But the queryparser doesn't really seem to be consistent.
> doc:test
> Running query 'Xapian::Query(ZDOCTYPEtest:(pos=1))'
> 
> Here it applies stemming to the term before running the query (Z-prefix)
> 
> doc:1234
> Running query 'Xapian::Query(DOCTYPE1234:(pos=1))'
> 
> There it skips the stemming.
> 
> What is the reason for behaving different based on user-input?

http://www.xapian.org/docs/termgenerator.html

    Now we index all terms lowercased with positional information, and
    also stemmed with a 'Z' prefix (unless they start with a digit) [...]

Indexing terms which start with a digit twice just bloats the database.
I'm not aware of a language where words can start with a digit, and it
can actually harm retrieval if we attempt to stem part numbers and other
codes.

Cheers,
    Olly



More information about the Xapian-discuss mailing list