[Xapian-discuss] queryparser thinks ø is o
R. Mattes
rm at seid-online.de
Sun Aug 28 13:14:15 BST 2005
On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:
> hey. I'm having some problems with the Xapian QueryParser using the
> perl bindings. It turns all scandinavian characters into the english
> alphabet. See the following example:
>
> $qp->set_stemmer($stemmer);
> print $qp->parse_query('bølle')."\n";
> print $stemmer->stem_word('bølle')."\n";
>
> Returns
>
> marcus at ds1:~/src/Horus-Indexer$ ./stemtest
> Xapian::Query(bolle:(pos=1))
> bølle
>
> So, I'm pretty sure it's not the stemmer. Any other ideas?
Lost's of :-)
Yes, the queryparser itself modifies characters. The code that does this
is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
this is a rather "murky" and anglocentric part of the Xapian codebase.
Frankly, i just removed the offending parts of the code - but a cleaner
solution would be preferable. My current approach would be to make
the static tables in 'xapian/xapian-core/queryparser/symboltab.h'
configurable by language (sigh, not enough time right now).
HTH Ralf Mattes
> Marcus
>
> Ps. ( for your info, bølle eq bully, and bolle eq 'bowl' )
> Pps. I've implemented the set_parser function in QueryParser. It
> should work, and I get the same results with set_stemming_options. :)
>
> Marcus
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
More information about the Xapian-discuss
mailing list