[Xapian-devel] contribution to "Add more stemming algorithms"

James Aylett james-xapian at tartarus.org
Tue Feb 18 14:32:16 GMT 2014


On 18 Feb 2014, at 14:08, "Hurricane Tong" <zhangshangtong.cpp at qq.com> wrote:

> I am trying to contribute to the "bite-site" project, "Add more stemming algorithms".
> I implement the Lancaster (Paice/Husk) stemming algorithm by building a class named StemLancaster extending 
> the StemImplementation, with the guide in http://www.comp.lancs.ac.uk/computing/research/stemming/index.htm.
> I think this class can be added to the default API for the potential users who are interested in this algorithm.

Hi, that sounds like a good approach to getting familiar with Xapian, the build system &c.

> There is the source code, https://github.com/HurricaneTong/Xapian, would you like to give me some suggestions about the source code, and can this code be added to the source code of Xapian after necessary modifying ?

Either this will want integrating into the Xapian codebase, or will need its own build system and tests. For something this size, I'd think that integrating it is reasonable. For this, you'll want to fork Xapian on github, integrate your code into it, and then issue a pull request (which provides ways for us to comment directly on the code line by line).

Before you do that, please read:

https://github.com/xapian/xapian/blob/master/xapian-core/HACKING

which talks about coding style (there are some changes you'll want to make), licensing statements and other pieces that we like to see for submissions. Crucially, we don't want to merge changes that do not have supporting tests, or that are not documented. It looks like you have some API documentation for you code, but there will need to be something in docs/stemming.rst; tests should be added to tests/api_stem.cc and tests/stemtest.cc — you want to ensure that constructing a Lancaster stemmer by name, such as Xapian::Stem st("lancaster"), will work, but also that running the stemmer produces the expected results. We do this for existing stemmers using xapian-data/stemming (which is used by tests/stemtest.cc); you'll need a word list and expected output, which the Lancaster stemmer may provide as a reference?

Also check out <http://xapian.org/docs/tests.html>, which talks about how to write tests.

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-devel mailing list