[Xapian-discuss] Stopword addition and stemming

goran kent gorankent at gmail.com
Mon Nov 15 08:35:59 GMT 2010


Two questions which I'm unsure about:

Stemming:  I've turned on stemming, etc, but how can I confirm that
it's being used in searches?  What should I look/search for?

Stopwords:  I'm trying out xapian on a regional dataset (searching
data from a *.co.us TLD, eg) .  I've noticed that searching for [bob
co.us] results in *very* slow search times (tens of seconds), since it
seems to be searching for two extremely common (almost every document
will have something.co.us in it) terms "co" and "us", and the
not-so-common "bob".  Searching only for "bob" is quick.

Would it make sense to add "co" and "us" to the stopword list to
prevent that kind of catastrophic slowdown in search time?  Since the
dataset is obviously about ".co.us" I feel it's kind of redundant to
be searching for something you know is there...


More information about the Xapian-discuss mailing list