[Xapian-discuss] STEM_SOME (was: Custom Stemmng and QueryParser)

Olly Betts olly at survex.com
Wed Aug 20 02:48:07 BST 2008


On Wed, Aug 13, 2008 at 12:55:46PM -0400, Mike Boone wrote:
> The issue appears to boil down to this. I am trying to parse a query
> with STEM_SOME set. It's described in the docs as "Search for stemmed
> forms of terms except for those which start with a capital letter".
> 
> I am custom-stemming a few words, which are stored in the index
> prefixed with XZ.
> 
> I tried to prefix these myself before sending them to the query
> parser, but they get stemmed anyway:

I should insert the standard warning here that it's generally not a good
idea to try to "adjust" the input to the QueryParser.  Since it aims to
parse potentially free-form input from users as well as boolean
structure, you'd generally have to build the equivalent of QueryParser
and the equivalent of un-QueryParser to avoid unexpected handling of
some cases.

There ought to be a way to control parsing and manipulation of terms
by slotting bits of code into the QueryParser framework, but currently
there are just some settings like a stemmer object to use and the
stemming strategy.

> "XZiis AND sharp" (no quotes) gets parsed as Xapian::Query((xziis:(pos=1) AND
> Zsharp:(pos=2))). The first term should be XZiis.

No, the word the user specified is "XZiis".  The "XZ" here isn't a
term-prefix - that's an implementation detail invisible to the user.

If this worked how you seem to want, then a query for "Swordplay" would
be interpreted as an "S" prefixed term and actually match "wordplay" in
the title instead!

> If I try to use add_prefix('custom','XZ'), "custom:iis AND sharp" (no
> quotes) is parsed as Xapian::Query((ZXZii:(pos=1) AND
> Zsharp:(pos=2))).

Yes, because "iis" starts with a lower-case "i", so with STEM_SOME we
stem it.

> What I'm trying to get is: Xapian::Query((XZiis:(pos=1) AND Zsharp:(pos=2)))

Try this:

custom:IIS AND sharp

Cheers,
    Olly



More information about the Xapian-discuss mailing list