[Xapian-discuss] STEM_SOME (was: Custom Stemmng and QueryParser)
Olly Betts
olly at survex.com
Wed Aug 20 02:48:07 BST 2008
On Wed, Aug 13, 2008 at 12:55:46PM -0400, Mike Boone wrote:
> The issue appears to boil down to this. I am trying to parse a query
> with STEM_SOME set. It's described in the docs as "Search for stemmed
> forms of terms except for those which start with a capital letter".
>
> I am custom-stemming a few words, which are stored in the index
> prefixed with XZ.
>
> I tried to prefix these myself before sending them to the query
> parser, but they get stemmed anyway:
I should insert the standard warning here that it's generally not a good
idea to try to "adjust" the input to the QueryParser. Since it aims to
parse potentially free-form input from users as well as boolean
structure, you'd generally have to build the equivalent of QueryParser
and the equivalent of un-QueryParser to avoid unexpected handling of
some cases.
There ought to be a way to control parsing and manipulation of terms
by slotting bits of code into the QueryParser framework, but currently
there are just some settings like a stemmer object to use and the
stemming strategy.
> "XZiis AND sharp" (no quotes) gets parsed as Xapian::Query((xziis:(pos=1) AND
> Zsharp:(pos=2))). The first term should be XZiis.
No, the word the user specified is "XZiis". The "XZ" here isn't a
term-prefix - that's an implementation detail invisible to the user.
If this worked how you seem to want, then a query for "Swordplay" would
be interpreted as an "S" prefixed term and actually match "wordplay" in
the title instead!
> If I try to use add_prefix('custom','XZ'), "custom:iis AND sharp" (no
> quotes) is parsed as Xapian::Query((ZXZii:(pos=1) AND
> Zsharp:(pos=2))).
Yes, because "iis" starts with a lower-case "i", so with STEM_SOME we
stem it.
> What I'm trying to get is: Xapian::Query((XZiis:(pos=1) AND Zsharp:(pos=2)))
Try this:
custom:IIS AND sharp
Cheers,
Olly
More information about the Xapian-discuss
mailing list