[Xapian-discuss] correct use of stem

Olly Betts olly at survex.com
Sat Jan 13 02:17:08 GMT 2007


On Sat, Jan 06, 2007 at 12:34:25PM -0800, Alexander Lind wrote:
> Currently I pre-parse every incoming query in my own code before I send 
> it on to Xapian.
> Among other things, I determine what parts of a query are actual search 
> words, and I then manually stem each one of them with Xapians 
> Stem::stem_word().
> 
> I was wondering if this is the correct approach, or is there a better 
> way of doing this, ie have Xapian automatically do it on the words in 
> the query itself, when the query string is passed on to 
> QueryParser::parse_query() ?

Yes, it's much better to let the QueryParser do the stemming.  Just
call Xapian::QueryParser::set_stemmer().

I'd say it's a mistake to try to manipulate user specified query strings
before passing them to the QueryParser.  You'll essentially need to
duplicate how the QueryParser parses a query string, but the exact
handling of corner cases (especially in the case of oddly formed
queries, such as a phrase search with unmatched quotes) is open to
change, so even if you reverse engineer the current behaviour, it might
change in a future release.

Cheers,
    Olly



More information about the Xapian-discuss mailing list