[Xapian-discuss] QueryParser lowercase / uppercase and stemming

dd ddturbo at gmx.de
Wed May 17 15:17:39 BST 2006


Hello.

There are several problems I couldn't find a solution.
1. QueryParser does not perform stemming
I am working with PHP5 and use the xapian wrapper written by Daniel Ménard
I build a query using parseQuery. Output of the parsed query shows that 
terms are not stemmed, although a stemmer is set ( see code snippet)

    # create a XapianDatabase object to search in
    $db = new XapianDatabase($path2db);
    # every Query needs an XapianEnquire object; i.e. specifying 
database to search in
    $enquire = new XapianEnquire($db);

    # call XapianQuery object
    $myQueryParser = new XapianQueryParser();
    $myQueryParser->setDatabase($db);

    $stemmer = new XapianStemmer("german");

    $myQueryParser->setStemmer($stemmer);
    $myQueryParser->setStemmingStrategy(STEM_ALL);
   
    #$querystring = removeUmlaute($querystring);

    #wildcard search
    $myQuery = $myQueryParser->parseQuery($querystring, 
Xapian::FLAG_PHRASE|Xapian::FLAG_BOOLEAN|Xapian::FLAG_LOVEHATE|Xapian::FLAG_WILDCARD);
 ...


So what am I doing wrong?

The second thing I wondered about, is there any possibility to forbid 
queryparser lowercasing of the query string. At least for exact phrase 
matching I found this quite meaningful. (Data is indexed both, upper- 
and lowercase)

Another thing is the encoding of non ascii chars (I hope I didn't miss 
something in the postings of the mailing list). After applying UTF-8 
patch for xapian version 0.9.5, characters like ä ö ü cause a mistake in 
parsing a term (e.g. Köln is  processed to 'k' and 'n'). Surprisingly 
using the unpatched xapian-cores and building a query without 
queryparser results in exact matches when searching for example for 'Köln'.

So what about this?

Thanks for any help

DD





More information about the Xapian-discuss mailing list