[Xapian-discuss] Stem doesn't remove ' from end of words that are possessive?

Richard Boulton richard at lemurconsulting.com
Tue Feb 13 16:38:06 GMT 2007


Jarrod Roberson wrote:
> If I stem a word with  " 's " at the end it strips off the s and leaves the
> ' hanging.
> this means that possessives don't stem down to a useful term.
> Removing the " ' " as well as the trailing " s " would be a more useful
> behavior.
> 
> For example:
>    uncle's stems down to uncle'
> if it stemmed down to just " uncle " that would be a much more useful
> behavior I would believe.
> 
> Is this the intended behavior? If so why?

The current stemming algorithms (as used in Xapian version 0.9.9) don't 
have and special code for handling apostrophes at all.

Olly is right in the middle of updating SVN HEAD to the latest versions 
of the stemming algorithms, in which the English stemmer handles 
apostrophes.  This means that "uncle", "uncle's" and "uncles'" will all 
stem to "uncl" in the next release of Xapian.

We may also need to do some work on the query parser and text processing 
code in Omega to ensure that apostrophes are passed through to the 
stemming algorithm correctly; I'm not sure exactly which characters get 
stripped out before being passed to the stemmers currently.

-- 
Richard



More information about the Xapian-discuss mailing list