Re: [Xapian-discuss] queryparser thinks ø is o

Marcus Ramberg marcus at startsiden.no
Mon Aug 29 13:18:23 BST 2005


On Aug 28, 2005, at 2:14 PM, R. Mattes wrote:

> On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:
>
>> marcus at ds1:~/src/Horus-Indexer$ ./stemtest
>> Xapian::Query(bolle:(pos=1))
>> bølle
>> So, I'm pretty sure it's not the stemmer. Any other ideas?
>
> Lost's of :-)
> Yes, the queryparser itself modifies characters. The code that does  
> this
> is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
> this is a rather "murky" and anglocentric part of the Xapian codebase.
>
> Frankly, i just removed the offending parts of the code - but a  
> cleaner
> solution would be preferable. My current approach would be to make
> the static tables in 'xapian/xapian-core/queryparser/symboltab.h'
> configurable by language (sigh, not enough time right now).

hey Ralf.

Thanks for the tips, however, disabling the action in normalizer  
makes the queryparser tokenize on æøå instead of including them in  
the term. where can I modify the tokenizer in queryparser to include  
high-ascii chars (or at least the ones I need).

Marcus


More information about the Xapian-discuss mailing list