[Xapian-discuss] Re: searching and sorting by date
James Aylett
james-xapian at tartarus.org
Wed Mar 29 18:14:44 BST 2006
On Wed, Mar 29, 2006 at 05:40:38PM +0100, Olly Betts wrote:
> > Stemming in general is actually harmful.
>
> That's a bit strong.
>
> TREC tests and the like provide a lot of evidence that stemming improves
> retrieval. It's true that it can be harmful in cases when words that are
> unrelated (or not closely related enough) get conflated, but then *NOT*
> conflating words is also harmful in many cases and on balance stemming is
> a win.
My point is that stemming is a destructive activity (hence: harmful),
not that it isn't useful.
> > Do we have figures on how likely unwanted stem conflation happens with
> > capitalised terms in English?
>
> I don't. It's more problematic when it does happen though, since most
> people are understandably attached to their names!
True. When omega dbs are used as intended, everything is happy - I'm
just wondering whether correct usage or incorrect is more likely to be
a boundary condition (in the general case - most domain specific users
are reasonably easy to train).
> > Also, do we have figures on how often users bother to capitalise
> > proper nouns in natural language search? I'm guessing: not often.
>
> I can't see how to give quantitative figures without spending a lot of
> time analysing logs by hand, but it seems they do more often than not
> from a quick inspection.
Hmm. My experience (completely unscientific, watching people type into
Google) suggests otherwise. It may be to do with the type of audience,
of course.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list