[Xapian-discuss] best method for stemming

Dobrica Pavlinusic dpavlin at rot13.org
Wed Feb 8 18:38:34 GMT 2006


On Wed, Feb 08, 2006 at 12:18:28PM +0000, Olly Betts wrote:
> Alternatively, you could stem nothing at index time and then for search
> terms which you want to stem, stem them, and then run them through an
> "unstemming" algorithm to produce a list of terms they could have come
> from.  Then OR this list together.  Unfortunately nobody has written
> the "unstemmer" yet.  Also this means more work at search time than
> the first approach, but that may not really matter.  I've not tried
> the idea, so I can't say for sure.

I have written a module that produces alternative spellings from ispell
data files in perl. It's available at

http://search.cpan.org/~dpavlin/Lingua-Spelling-Alternative/

I mainly use it to index Croatian, where we don't have a stemmer. I
store all words and than expand query to all variants to catch them.
Croatian is very irregular, but this works very well for me.

-- 
Dobrica Pavlinusic               2share!2flame            dpavlin at rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin




More information about the Xapian-discuss mailing list