[Xapian-discuss] best method for stemming

Dobrica Pavlinusic dpavlin at rot13.org
Wed Feb 8 18:38:34 GMT 2006

On Wed, Feb 08, 2006 at 12:18:28PM +0000, Olly Betts wrote:
> Alternatively, you could stem nothing at index time and then for search
> terms which you want to stem, stem them, and then run them through an
> "unstemming" algorithm to produce a list of terms they could have come
> from.  Then OR this list together.  Unfortunately nobody has written
> the "unstemmer" yet.  Also this means more work at search time than
> the first approach, but that may not really matter.  I've not tried
> the idea, so I can't say for sure.

I have written a module that produces alternative spellings from ispell
data files in perl. It's available at


I mainly use it to index Croatian, where we don't have a stemmer. I
store all words and than expand query to all variants to catch them.
Croatian is very irregular, but this works very well for me.

Dobrica Pavlinusic               2share!2flame            dpavlin at rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin

More information about the Xapian-discuss mailing list