[Xapian-discuss] Re: searching and sorting by date

Olly Betts olly at survex.com
Wed Mar 29 16:02:56 BST 2006


On Fri, Mar 24, 2006 at 09:46:43AM +0000, James Aylett wrote:
> Raw terms are actually more to do with stemming - we don't stem when
> generating raw terms, but we do for all others. (Sorry, I may not have
> made that clear before.) I imagine this is because capitalised words
> in English are likely to be names, where stemming isn't that helpful
> (Richard or Olly should be able to confirm this).

More precisely, it's often actually harmful rather than just not that
helpful.

The first example which comes to mind is that the English stemmer
conflates the names "tony" (usually male) and "toni" (usually female)
(both are stemmed to "toni"), but there are numerous others.

Cheers,
    Olly



More information about the Xapian-discuss mailing list