[Xapian-discuss] Stemming

Jean-Francois Dockes jean-francois.dockes at wanadoo.fr
Thu Feb 10 13:16:48 GMT 2005


(subject: where to store the (stem to words) relationship
James Aylett writes:
 > I'd advise /either/ having a different database for it (so you don't
 > need STM:stemvalue, just 'stemvalue') /or/ just using the stemmed
 > terms to index the documents, but add in another term which you can
 > filter on the /lack/ of for normal searches.

Thanks a lot, I implemented storing in separate databases. Better to keep
it simple, as the stem database is very small in practice (many terms
stem to themselves, or have no other terms that stem to the same value, and
so do not need an entry). In fact it's so small, I could store precomputed
versions for several languages. 

I guess that cross-language stemming is going to produce strange results at
times, but it's more or less bound to happen if the user mixes documents
in different languages, which is probably the general case.

 > On the other hand, search-time stemming and query expansion gives you
 > advantages in not needing to detect the language of everything you
 > stem right now. For a personal search tool, that might be a big bonus.

Yes, I am not sure how useful it will be, but it does seem nice to be able to
turn stemming on/off or change languages at query time, on the fly. 

Regards,
J.F. Dockes



More information about the Xapian-discuss mailing list