[Xapian-devel] Hi,

Olly Betts olly at survex.com
Thu Aug 25 12:26:18 BST 2011


On Thu, Aug 25, 2011 at 04:01:30PM +0530, Aman (neshu) Agarwal wrote:
> If I am not wrong to add stemming I need to  edit files in
> "xapian/xapian-core/languages" let say I want to add new stemming algorithm
> for english, so in that case I need to make changes in english.cc

If you're adding a new algorithm, create a new file (or files) for it.

There are actually 3 English stemming algorithms there already:

* english.cc generated from english.sbl, which is the Snowball English
  stemmer

* porter.cc generated from porter.sbl, which is the Porter stemmer - an
  older version of english.sbl, included for compatibility mostly

* lovins.cc generated from lovins.sbl, which Lovins' algorithm

If you're adding one written in Snowball (http://snowball.tartarus.org/)
then you'd add the .sbl file and update the list in
languages/Makefile.mk and the .cc version would be generated
automatically.  If you're adding an algorithm coding by hand in C or C++
then you'd just add that file (and add it to languages/Makefile.mk too).

Then in stem.cc there's a big switch statement which determines which
algorithm to use when constructing a new Stem object, so you need to
hook up your new algorithm there.

> P.S. I compiled the developer version successfully

Cool.  Is the solution something useful to share, or was it just a local
issue?

Cheers,
    Olly



More information about the Xapian-devel mailing list