[Xapian-discuss] Stopword addition and stemming

Marinos Yannikos mjy at pobox.com
Wed Nov 17 20:33:41 GMT 2010


Am 15.11.2010 09:35, schrieb goran kent:
> Would it make sense to add "co" and "us" to the stopword list to
> prevent that kind of catastrophic slowdown in search time?  Since the
> dataset is obviously about ".co.us" I feel it's kind of redundant to
> be searching for something you know is there...

I'd simply cut off .co.us from search queries (if even present) and from the 
input to be indexed if it can be assumed to be present always.

One thing that I tripped over while working on a Xapian-based search for data 
that isn't natural-language text: be aware that Xapian is treating some 
characters specially, for example if you throw a hyphen at the parser, it'll 
match the terms before and after it without hyphen (i.e. as one word) as well. 
This might not be what you want (if someone searches for "foo-bar.co.us" you 
might not want to show him results for "foobar.co.us").

Regards,
  Marinos




More information about the Xapian-discuss mailing list