[Xapian-discuss] Re: searching and sorting by date
Olly Betts
olly at survex.com
Wed Mar 29 17:40:38 BST 2006
On Wed, Mar 29, 2006 at 04:14:50PM +0100, James Aylett wrote:
> Stemming in general is actually harmful.
That's a bit strong.
TREC tests and the like provide a lot of evidence that stemming improves
retrieval. It's true that it can be harmful in cases when words that are
unrelated (or not closely related enough) get conflated, but then *NOT*
conflating words is also harmful in many cases and on balance stemming is
a win.
> Do we have figures on how likely unwanted stem conflation happens with
> capitalised terms in English?
I don't. It's more problematic when it does happen though, since most
people are understandably attached to their names!
> Also, do we have figures on how often users bother to capitalise
> proper nouns in natural language search? I'm guessing: not often.
I can't see how to give quantitative figures without spending a lot of
time analysing logs by hand, but it seems they do more often than not
from a quick inspection.
Cheers,
Olly
More information about the Xapian-discuss
mailing list