[Xapian-devel] non-snowball stemmer

Olly Betts olly at survex.com
Sat Jan 13 02:31:24 GMT 2007


On Tue, Jan 09, 2007 at 03:30:48PM +0000, Richard Boulton wrote:
> Your patch should be against the current Xapian subversion head, if 
> possible.  Also, note that Xapian's snowball stemmers are quite out of 
> date - in particular, they don't use the UTF-8 encoding.  The stemmers 
> are due to be updated for the next release (ie, the 1.0 release - they 
> couldn't be updated for the 0.9.x series, because this would have broken 
> existing databases).

I've mostly finished updating Xapian to use UTF-8 versions of the
snowball stemmers, and have rewritten Xapian::Stem::Internal()
completely, so current SVN HEAD isn't a good place to start for
patching this area right now.

The new design actually makes it very easy to hook in stemming
algorithms which aren't based on Snowball.

> Also, if the stemmer is available under a GPL compatible license, it 
> would be nice to work out instructions on how to download it and use it. 

It's LGPL.

> Perhaps you could add information to the Xapian wiki somewhere? 
> Unfortunately, I don't read Russian, so couldn't get much useful 
> information from the page you linked to.

Google translate does a reasonable job on the site:

http://translate.google.com/translate?u=http%3A%2F%2Fwww.aot.ru%2F&langpair=ru%7Cen

Cheers,
    Olly



More information about the Xapian-devel mailing list