[Xapian-discuss] QueryParser lowercase / uppercase and stemming

Olly Betts olly at survex.com
Wed May 17 22:55:23 BST 2006


On Wed, May 17, 2006 at 04:17:39PM +0200, dd wrote:
> 1. QueryParser does not perform stemming

It does!

>    $myQueryParser->setStemmingStrategy(STEM_ALL);

I'm not that familiar with Daniel's wrappers, but my guess is that
STEM_ALL isn't the correct name for this constant, so you're passing the
string "STEM_ALL" in here which probably gets interpreted as 0, meaning
"don't stem anything".

One of PHP's nastier features that...

> Another thing is the encoding of non ascii chars (I hope I didn't miss 
> something in the postings of the mailing list). After applying UTF-8 
> patch for xapian version 0.9.5, characters like ä ö ü cause a mistake in 
> parsing a term (e.g. Köln is  processed to 'k' and 'n').

You want to modify accentnormalisingitor.h too:

http://article.gmane.org/gmane.comp.search.xapian.general/1927

> Surprisingly using the unpatched xapian-cores and building a query
> without queryparser results in exact matches when searching for
> example for 'Köln'.

If you create a term with Query("term") you get *exactly* what you pass
as the term (even arbitrary binary data - if you pass a C++ std::string
containing zero bytes, the term will contain zero bytes.)

But by its nature, QueryParser has to split the passed string up so
there's the issue of what is and isn't a "word character".

Cheers,
    Olly



More information about the Xapian-discuss mailing list