[Xapian-devel] Paice-Husk Stemmer

Satwant Rana sat8003wantrana at gmail.com
Mon Mar 31 20:25:33 BST 2014


Hi everyone,

I was working on the Paice-Husk Stemmer, which is a Bite Size Project for
Xapian, and I have created a C++ as well as Snowball version of it.

I read the algorithm, and picked the rules from here:
http://www.comp.lancs.ac.uk/computing/research/stemming/paice/descript.htm

The C++ code takes rules as input from a file and generates the stem of
given word, whereas the Snowball version has rules written in it. This is
because file handling is not possible in Snowball, and so I have written a
C++ code that generates the Snowball code (Code-ception :P).

Since the algorithm has many steps, my codes might have some mistakes.

This is where they are located: https://github.com/satwantrana/codes

I will be integrating this in my Xapian fork, and release a patch soon.
Meanwhile, if someone finds a bug/mistake in this, please respond.

Also, I hope this implementation helps my GSoC application.

Thanks,
Satwant Rana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140401/7876f811/attachment.html>


More information about the Xapian-devel mailing list