[Xapian-discuss] Blacklist stemming
Olly Betts
olly at survex.com
Thu Jun 11 06:38:56 BST 2009
On Sat, Jun 06, 2009 at 12:50:30AM +0300, Silviu-Ionut Ganceanu wrote:
> I need to modify the stemming for a couple of words (a blacklist) and for
> all the other to use the usual snowball stemmer.
>
> The "natural" way of doing it would be to derive from Stem and override
> operator ()... but I am using *python-bindings*. Would this be possible?
Not currently. The big problem is it requires fairly major incompatible
API changes, so it's currently slated as waiting for the next major
version. There's a ticket which is relevant:
http://trac.xapian.org/ticket/186
And a branch with an experimental implementation:
http://trac.xapian.org/browser/branches/stemrefcnt
> If not I have two other solutions in mind:
>
> - add a custom stemmer to Xapian
That would work, and is probably simpler than the second idea.
> - write custom index & search methods in python using add_posting & hacks
> to modify the query tree respectively
There isn't really a way to modify a query tree (they're immutable, and
there aren't methods to read through an existing tree so you can build a
modified version). Probably doing your own query parsing is the way to
implement this approach.
> Both solutions are not too appealing.
>
> What would be the easiest way to do it?
You could add a "words not to stem" feature to the Xapian::Stem class
(or equivalent functionality such as a "stem 'X' to 'Y'" exception
list). I think that would work.
Cheers,
Olly
More information about the Xapian-discuss
mailing list