[Xapian-discuss] queryparser thinks ø is o

R. Mattes rm at seid-online.de
Sun Aug 28 13:14:15 BST 2005


On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:
> hey. I'm having some problems with the Xapian QueryParser using the  
> perl bindings. It turns all scandinavian characters into the english  
> alphabet. See the following example:
> 
> $qp->set_stemmer($stemmer);
> print $qp->parse_query('bølle')."\n";
> print $stemmer->stem_word('bølle')."\n";
> 
> Returns
> 
> marcus at ds1:~/src/Horus-Indexer$ ./stemtest
> Xapian::Query(bolle:(pos=1))
> bølle
> 
> So, I'm pretty sure it's not the stemmer. Any other ideas?

Lost's of :-)
Yes, the queryparser itself modifies characters. The code that does this
is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
this is a rather "murky" and anglocentric part of the Xapian codebase.

Frankly, i just removed the offending parts of the code - but a cleaner
solution would be preferable. My current approach would be to make 
the static tables in 'xapian/xapian-core/queryparser/symboltab.h' 
configurable by language (sigh, not enough time right now).

 HTH Ralf Mattes

> Marcus
> 
> Ps. ( for your info, bølle eq bully, and bolle eq 'bowl' )
> Pps. I've implemented the set_parser function in QueryParser. It  
> should work, and I get the same results with set_stemming_options. :)
> 
> Marcus
> 
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss




More information about the Xapian-discuss mailing list