[Xapian-discuss] Stemming, stopping, and multiple languages

Sean McCleary sean.mccleary at gmail.com
Sun Jul 19 16:22:09 BST 2009


Hello all, I'm just getting started with Xapian and have a question
about stemming, stop words, and multiple languages.

So I have two Xapian databases, one containing documents in English,
and another containing documents in German.  When I index them, I
don't use a stemmer or a stop words, as I've read that it's considered
best practice to apply a stemmer and stop words at the time of
searching, not indexing.

So when I'm searching one database at a time, it's easy.  Load a
stemmer for the appropriate language, load the stop words.

When I want to search through both at once, I can easily load both
databases.  But it seems that the stemmer and stop words are applied
to to the query, not the databases.  So if I had, for example, the
word "die" (which means "the") in my list of German stop words, it
would also exclude the word "die" (as in, "cease to be alive") from
any English documents as well, right?  The same problem applies to the
stemmer -- I can only load one for one of the languages.

Is there any way around this?  Or does this mean I need to apply
stemmers and stop words at the time of indexing to get this to work?

Thanks for any advice,

Sean



More information about the Xapian-discuss mailing list