[Xapian-devel] Odd stemmer behavior

Olly Betts olly at survex.com
Fri Apr 21 00:54:19 BST 2006


On Thu, Apr 20, 2006 at 08:45:11PM -0300, Paul Legato wrote:
> I've noticed some strange results from the stemmer in the Ruby port:
> 
> irb(main):003:0> @stem.stem_word("anybody")
> => "anybodi"
> irb(main):004:0> @stem.stem_word("swimmingly")
> => "swim"
> irb(main):005:0> @stem.stem_word("fiercely")
> => "fierc"
> irb(main):006:0> @stem.stem_word("fraudulently")
> => "fraudul"
> 
> Is it supposed to behave like this, or is this a bug in my Ruby wrapper?

Those are the results that are expected.  The stem isn't necessarily a
word itself, though it generally looks mostly like one.

In this case "swimmingly" -> "swim" is an example of overstemming, since
"swim" also stems to "swim" and the two words don't really share
anything in meaning (although I imagine that they share a linguistic
root, but that's not what we care about here).

Cheers,
    Olly



More information about the Xapian-devel mailing list