[Xapian-devel] Bitsize project - Krovetz stemmer

James Aylett james-xapian at tartarus.org
Tue Feb 10 17:52:22 GMT 2015


On 10 Feb 2015, at 17:35, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:

> I was going through the bit size projects and found the Krovetz English stemmer and I would really like to work on it.
> But I have a few doubts.
> Though implementation of krovetz stemmer isnt very hard, xapian stemmers are made with snowball.
> But krovetz stemmer doesnt seem to be openly implementable with snowball.

Xapian has an abstraction layer which would allow you to implement the Krovetz stemmer alongside Snowball stemmers.

> Also though this is a dictionary based stemmer, the original paper doesnt give us pointers on how to create the dictionary. Though I think this can be overcome by looking at available implementations of the stemmer.

The dictionary may want to be configurable so we aren’t forcing people to use our recoding rules. That also to an extent means we don’t have to immediately worry about having a good recoding list, so you can focus on building the stemmer itself first.

> As of now, I am planning to start on writing snowball code by starting with plural forms of words.

Because of the structure of the Krovetz algorithm, I’m not sure that using Snowball for the individual transform steps is going to be particularly easy. You might be better off doing a straight implementation, from the paper, of the entire algorithm (in C or C++). (I could be wrong, but it feels sufficiently small that integrating one or more Snowball stemmers in the middle of the algorithm might be more confusing than the whole thing in C.)

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-devel mailing list