[Xapian-discuss] Queryparser problem..

Jesper Krogh jesper at krogh.cc
Sun Dec 9 18:06:36 GMT 2007


Olly Betts skrev:
> On Sun, Dec 09, 2007 at 08:16:17AM +0100, Jesper Krogh wrote:
>> The queryparser in my setup is using strategy STEM_SOME which seem to 
>> give the best handling of the data in our setup.
>>
>> But the queryparser doesn't really seem to be consistent.
>> doc:test
>> Running query 'Xapian::Query(ZDOCTYPEtest:(pos=1))'
>>
>> Here it applies stemming to the term before running the query (Z-prefix)
>>
>> doc:1234
>> Running query 'Xapian::Query(DOCTYPE1234:(pos=1))'
>>
>> There it skips the stemming.
>>
>> What is the reason for behaving different based on user-input?
> 
> http://www.xapian.org/docs/termgenerator.html
> 
>     Now we index all terms lowercased with positional information, and
>     also stemmed with a 'Z' prefix (unless they start with a digit) [...]
> 
> Indexing terms which start with a digit twice just bloats the database.
> I'm not aware of a language where words can start with a digit, and it
> can actually harm retrieval if we attempt to stem part numbers and other
> codes.

Ok, Thanks.

I'm probably just (mis-)using Xapian anyway. The problem is that every 
document should be traceable after retrieval. Thus I add:
doctype:<type> and id:<id>
The "viewer" application, then knows what to do and I can search the 
document up and replace it by letting the indexer query for "doctype:<> 
id:<>" before doing add_ or replace_.

This worked flawless until my "doctype" actually was stemmable..

How does people generally solve this task? (adding a 0 in fron of my 
doctype would solve the problem.. but elegant?).



Jesper .. we're using a "homebrewet" termgenerator and tries to play 
nice with how the queryparser expects the dataset to be.

-- 
Jesper Krogh




More information about the Xapian-discuss mailing list