[Xapian-discuss] Stemming problem

James Aylett james-xapian at tartarus.org
Wed Jul 4 13:59:53 BST 2007


On Wed, Jul 04, 2007 at 01:43:27PM +0100, Richard Boulton wrote:

> > I created sample files with test, tester, testing, tests. A query of 
> > "test" could find all of them except "tester".
> 
> I'm not convinced that "tester" should stem to "test" - the meaning is
> quite different.  Also, that would have to be a special case: for
> example, a rule to convert "tester" to "test" would also convert
> "master" to "mast" - which is definitely wrong.

Most short -er words shouldn't stem the -er off, I suspect. (In
general, verbs?) I think we're stemming if the prefix >= 6 characters?
I don't really speak Snowball...

'Computer' is an interesting one. Stemming is doing semantic
conflation with 'compute'. Not sure if 'compute' is common enough we
should care, though.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list