[Xapian-discuss] QueryParser lowercase / uppercase and stemming
dd
ddturbo at gmx.de
Wed May 17 15:17:39 BST 2006
Hello.
There are several problems I couldn't find a solution.
1. QueryParser does not perform stemming
I am working with PHP5 and use the xapian wrapper written by Daniel Ménard
I build a query using parseQuery. Output of the parsed query shows that
terms are not stemmed, although a stemmer is set ( see code snippet)
# create a XapianDatabase object to search in
$db = new XapianDatabase($path2db);
# every Query needs an XapianEnquire object; i.e. specifying
database to search in
$enquire = new XapianEnquire($db);
# call XapianQuery object
$myQueryParser = new XapianQueryParser();
$myQueryParser->setDatabase($db);
$stemmer = new XapianStemmer("german");
$myQueryParser->setStemmer($stemmer);
$myQueryParser->setStemmingStrategy(STEM_ALL);
#$querystring = removeUmlaute($querystring);
#wildcard search
$myQuery = $myQueryParser->parseQuery($querystring,
Xapian::FLAG_PHRASE|Xapian::FLAG_BOOLEAN|Xapian::FLAG_LOVEHATE|Xapian::FLAG_WILDCARD);
...
So what am I doing wrong?
The second thing I wondered about, is there any possibility to forbid
queryparser lowercasing of the query string. At least for exact phrase
matching I found this quite meaningful. (Data is indexed both, upper-
and lowercase)
Another thing is the encoding of non ascii chars (I hope I didn't miss
something in the postings of the mailing list). After applying UTF-8
patch for xapian version 0.9.5, characters like ä ö ü cause a mistake in
parsing a term (e.g. Köln is processed to 'k' and 'n'). Surprisingly
using the unpatched xapian-cores and building a query without
queryparser results in exact matches when searching for example for 'Köln'.
So what about this?
Thanks for any help
DD
More information about the Xapian-discuss
mailing list