[Xapian-discuss] Stemming problem
Olly Betts
olly at survex.com
Wed Jul 4 14:36:35 BST 2007
On Wed, Jul 04, 2007 at 01:59:53PM +0100, James Aylett wrote:
> Most short -er words shouldn't stem the -er off, I suspect. (In
> general, verbs?) I think we're stemming if the prefix >= 6 characters?
It's generally defined in terms of constants and vowels, where
consecutive constants and vowels count once:
http://snowball.tartarus.org/texts/r1r2.html
I think I'd rather avoid maintaining variants of the snowball
algorithms, so if there are useful changes to make I think they should
be proposed to the snowball project, and if accepted, we'd then import
the updated version in due course.
Cheers,
Olly
More information about the Xapian-discuss
mailing list