[Xapian-discuss] Re: searching and sorting by date

James Aylett james-xapian at tartarus.org
Wed Mar 29 18:14:44 BST 2006


On Wed, Mar 29, 2006 at 05:40:38PM +0100, Olly Betts wrote:

> > Stemming in general is actually harmful.
> 
> That's a bit strong.
> 
> TREC tests and the like provide a lot of evidence that stemming improves
> retrieval.  It's true that it can be harmful in cases when words that are
> unrelated (or not closely related enough) get conflated, but then *NOT*
> conflating words is also harmful in many cases and on balance stemming is
> a win.

My point is that stemming is a destructive activity (hence: harmful),
not that it isn't useful.

> > Do we have figures on how likely unwanted stem conflation happens with
> > capitalised terms in English?
> 
> I don't.  It's more problematic when it does happen though, since most
> people are understandably attached to their names!

True. When omega dbs are used as intended, everything is happy - I'm
just wondering whether correct usage or incorrect is more likely to be
a boundary condition (in the general case - most domain specific users
are reasonably easy to train).

> > Also, do we have figures on how often users bother to capitalise
> > proper nouns in natural language search? I'm guessing: not often.
> 
> I can't see how to give quantitative figures without spending a lot of
> time analysing logs by hand, but it seems they do more often than not
> from a quick inspection.

Hmm. My experience (completely unscientific, watching people type into
Google) suggests otherwise. It may be to do with the type of audience,
of course.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list