[Xapian-discuss] Multilingual issues with Xapian

James Aylett james-xapian at tartarus.org
Thu Oct 11 10:36:39 BST 2007


On Thu, Oct 11, 2007 at 02:09:10AM +0200, Ron Kass wrote:

> What if instead of stemming all the words in a document, even if they have 
> no real stemmed form, the stemmer (during indexing) was to stem only words 
> that it knows having a stemmed form?

Wouldn't you need a dictionary of stemmed forms for that? At which
point you might as well use a dictionary approach to stemming, which
can (with lots of work) give you better stemming anyway. The problem
is that with algorithmic stemming, *everything* has a stemmed form,
even if it isn't useful.

This is quite a common problem, but I don't actually know what the
common solution is :)

(Unless you can mark up your languages properly. Then you just have to
worry about how you stem your query.)

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list