[Xapian-discuss] Re: searching and sorting by date

James Aylett james-xapian at tartarus.org
Wed Mar 29 16:14:50 BST 2006


On Wed, Mar 29, 2006 at 04:02:56PM +0100, Olly Betts wrote:

> > Raw terms are actually more to do with stemming - we don't stem when
> > generating raw terms, but we do for all others. [snip]
>
> It's often actually harmful rather than just not that helpful.
> 
> The first example which comes to mind is that the English stemmer
> conflates the names "tony" (usually male) and "toni" (usually female)
> (both are stemmed to "toni"), but there are numerous others.

Stemming in general is actually harmful. Do we have figures on how
likely unwanted stem conflation happens with capitalised terms in
English?

Also, do we have figures on how often users bother to capitalise
proper nouns in natural language search? I'm guessing: not often.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list