[Xapian-devel] Bitsize project: Krovetz Stemmer

James Aylett james-xapian at tartarus.org
Sun Feb 15 17:05:11 GMT 2015


On 15 Feb 2015, at 07:04, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:

> Since [Krovetz] is a dictionary based stemmer, im having problems on deciding how to create the dictionary.

Richhiey — I think I recommended that you load any dictionaries you need from a file, which could be specified when constructing the stemmer. That separates the need to create the dictionary from implementing the feature, although we’ll have to provide some initial dictionary eventually.

How you then structure that in your code as you load it from file and later use it is entirely up to you. If it’s just a list of words that should be treated specially, having a class to represent each word feels like overkill — you can probably do it with something like an STL container of a base_string of some sort (std::wstring? I haven’t done much Unicode in C++ work, so others may want to jump in and correct me here).

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-devel mailing list