[Xapian-devel] non-snowball stemmer
Olly Betts
olly at survex.com
Sat Jan 13 02:31:24 GMT 2007
On Tue, Jan 09, 2007 at 03:30:48PM +0000, Richard Boulton wrote:
> Your patch should be against the current Xapian subversion head, if
> possible. Also, note that Xapian's snowball stemmers are quite out of
> date - in particular, they don't use the UTF-8 encoding. The stemmers
> are due to be updated for the next release (ie, the 1.0 release - they
> couldn't be updated for the 0.9.x series, because this would have broken
> existing databases).
I've mostly finished updating Xapian to use UTF-8 versions of the
snowball stemmers, and have rewritten Xapian::Stem::Internal()
completely, so current SVN HEAD isn't a good place to start for
patching this area right now.
The new design actually makes it very easy to hook in stemming
algorithms which aren't based on Snowball.
> Also, if the stemmer is available under a GPL compatible license, it
> would be nice to work out instructions on how to download it and use it.
It's LGPL.
> Perhaps you could add information to the Xapian wiki somewhere?
> Unfortunately, I don't read Russian, so couldn't get much useful
> information from the page you linked to.
Google translate does a reasonable job on the site:
http://translate.google.com/translate?u=http%3A%2F%2Fwww.aot.ru%2F&langpair=ru%7Cen
Cheers,
Olly
More information about the Xapian-devel
mailing list