[Xapian-discuss] Re: searching and sorting by date
Olly Betts
olly at survex.com
Wed Mar 29 16:02:56 BST 2006
On Fri, Mar 24, 2006 at 09:46:43AM +0000, James Aylett wrote:
> Raw terms are actually more to do with stemming - we don't stem when
> generating raw terms, but we do for all others. (Sorry, I may not have
> made that clear before.) I imagine this is because capitalised words
> in English are likely to be names, where stemming isn't that helpful
> (Richard or Olly should be able to confirm this).
More precisely, it's often actually harmful rather than just not that
helpful.
The first example which comes to mind is that the English stemmer
conflates the names "tony" (usually male) and "toni" (usually female)
(both are stemmed to "toni"), but there are numerous others.
Cheers,
Olly
More information about the Xapian-discuss
mailing list