[Xapian-discuss] Blacklist stemming

Olly Betts olly at survex.com
Thu Jun 11 06:38:56 BST 2009


On Sat, Jun 06, 2009 at 12:50:30AM +0300, Silviu-Ionut Ganceanu wrote:
> I need to modify the stemming for a couple of words (a blacklist) and for
> all the other to use the usual snowball stemmer.
> 
> The "natural" way of doing it would be to derive from Stem and override
> operator ()... but I am using *python-bindings*. Would this be possible?

Not currently.  The big problem is it requires fairly major incompatible
API changes, so it's currently slated as waiting for the next major
version.  There's a ticket which is relevant:

http://trac.xapian.org/ticket/186

And a branch with an experimental implementation:

http://trac.xapian.org/browser/branches/stemrefcnt

> If not I have two other solutions in mind:
> 
>    - add a custom stemmer to Xapian

That would work, and is probably simpler than the second idea.

>    - write custom index & search methods in python using add_posting & hacks
>    to modify the query tree respectively

There isn't really a way to modify a query tree (they're immutable, and
there aren't methods to read through an existing tree so you can build a
modified version).  Probably doing your own query parsing is the way to
implement this approach.

> Both solutions are not too appealing.
> 
> What would be the easiest way to do it?

You could add a "words not to stem" feature to the Xapian::Stem class
(or equivalent functionality such as a "stem 'X' to 'Y'" exception
list).  I think that would work.

Cheers,
    Olly



More information about the Xapian-discuss mailing list