Re: [Xapian-discuss] queryparser thinks ø is o
Marcus Ramberg
marcus at startsiden.no
Mon Aug 29 13:18:23 BST 2005
On Aug 28, 2005, at 2:14 PM, R. Mattes wrote:
> On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:
>
>> marcus at ds1:~/src/Horus-Indexer$ ./stemtest
>> Xapian::Query(bolle:(pos=1))
>> bølle
>> So, I'm pretty sure it's not the stemmer. Any other ideas?
>
> Lost's of :-)
> Yes, the queryparser itself modifies characters. The code that does
> this
> is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
> this is a rather "murky" and anglocentric part of the Xapian codebase.
>
> Frankly, i just removed the offending parts of the code - but a
> cleaner
> solution would be preferable. My current approach would be to make
> the static tables in 'xapian/xapian-core/queryparser/symboltab.h'
> configurable by language (sigh, not enough time right now).
hey Ralf.
Thanks for the tips, however, disabling the action in normalizer
makes the queryparser tokenize on æøå instead of including them in
the term. where can I modify the tokenizer in queryparser to include
high-ascii chars (or at least the ones I need).
Marcus
More information about the Xapian-discuss
mailing list