[Xapian-discuss] best method for stemming
dpavlin at rot13.org
Wed Feb 8 18:38:34 GMT 2006
On Wed, Feb 08, 2006 at 12:18:28PM +0000, Olly Betts wrote:
> Alternatively, you could stem nothing at index time and then for search
> terms which you want to stem, stem them, and then run them through an
> "unstemming" algorithm to produce a list of terms they could have come
> from. Then OR this list together. Unfortunately nobody has written
> the "unstemmer" yet. Also this means more work at search time than
> the first approach, but that may not really matter. I've not tried
> the idea, so I can't say for sure.
I have written a module that produces alternative spellings from ispell
data files in perl. It's available at
I mainly use it to index Croatian, where we don't have a stemmer. I
store all words and than expand query to all variants to catch them.
Croatian is very irregular, but this works very well for me.
Dobrica Pavlinusic 2share!2flame dpavlin at rot13.org
Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
More information about the Xapian-discuss