[Xapian-discuss] Re: searching and sorting by date
James Aylett
james-xapian at tartarus.org
Wed Mar 29 16:14:50 BST 2006
On Wed, Mar 29, 2006 at 04:02:56PM +0100, Olly Betts wrote:
> > Raw terms are actually more to do with stemming - we don't stem when
> > generating raw terms, but we do for all others. [snip]
>
> It's often actually harmful rather than just not that helpful.
>
> The first example which comes to mind is that the English stemmer
> conflates the names "tony" (usually male) and "toni" (usually female)
> (both are stemmed to "toni"), but there are numerous others.
Stemming in general is actually harmful. Do we have figures on how
likely unwanted stem conflation happens with capitalised terms in
English?
Also, do we have figures on how often users bother to capitalise
proper nouns in natural language search? I'm guessing: not often.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list