[Xapian-discuss] Stopword addition and stemming
Marinos Yannikos
mjy at pobox.com
Wed Nov 17 20:33:41 GMT 2010
Am 15.11.2010 09:35, schrieb goran kent:
> Would it make sense to add "co" and "us" to the stopword list to
> prevent that kind of catastrophic slowdown in search time? Since the
> dataset is obviously about ".co.us" I feel it's kind of redundant to
> be searching for something you know is there...
I'd simply cut off .co.us from search queries (if even present) and from the
input to be indexed if it can be assumed to be present always.
One thing that I tripped over while working on a Xapian-based search for data
that isn't natural-language text: be aware that Xapian is treating some
characters specially, for example if you throw a hyphen at the parser, it'll
match the terms before and after it without hyphen (i.e. as one word) as well.
This might not be what you want (if someone searches for "foo-bar.co.us" you
might not want to show him results for "foobar.co.us").
Regards,
Marinos
More information about the Xapian-discuss
mailing list