[Xapian-discuss] Stemming problem
James Aylett
james-xapian at tartarus.org
Wed Jul 4 13:59:53 BST 2007
On Wed, Jul 04, 2007 at 01:43:27PM +0100, Richard Boulton wrote:
> > I created sample files with test, tester, testing, tests. A query of
> > "test" could find all of them except "tester".
>
> I'm not convinced that "tester" should stem to "test" - the meaning is
> quite different. Also, that would have to be a special case: for
> example, a rule to convert "tester" to "test" would also convert
> "master" to "mast" - which is definitely wrong.
Most short -er words shouldn't stem the -er off, I suspect. (In
general, verbs?) I think we're stemming if the prefix >= 6 characters?
I don't really speak Snowball...
'Computer' is an interesting one. Stemming is doing semantic
conflation with 'compute'. Not sure if 'compute' is common enough we
should care, though.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list