[Xapian-discuss] Re: searching and sorting by date

Olly Betts olly at survex.com
Wed Mar 29 17:40:38 BST 2006


On Wed, Mar 29, 2006 at 04:14:50PM +0100, James Aylett wrote:
> Stemming in general is actually harmful.

That's a bit strong.

TREC tests and the like provide a lot of evidence that stemming improves
retrieval.  It's true that it can be harmful in cases when words that are
unrelated (or not closely related enough) get conflated, but then *NOT*
conflating words is also harmful in many cases and on balance stemming is
a win.

> Do we have figures on how likely unwanted stem conflation happens with
> capitalised terms in English?

I don't.  It's more problematic when it does happen though, since most
people are understandably attached to their names!

> Also, do we have figures on how often users bother to capitalise
> proper nouns in natural language search? I'm guessing: not often.

I can't see how to give quantitative figures without spending a lot of
time analysing logs by hand, but it seems they do more often than not
from a quick inspection.

Cheers,
    Olly



More information about the Xapian-discuss mailing list