[Xapian-discuss] Phrase Search on Stemmed Data

Olly Betts olly at survex.com
Wed Jan 9 02:03:40 GMT 2008


On Fri, Jan 04, 2008 at 09:36:23PM +0100, Deniz Dalli wrote:
> in the docs
> 
> >>>STEM_SOME: Search for stemmed forms of terms except for those which 
> start with a capital letter, or are followed by certain characters 
> (currently: (/@<>=*[{" ), or are used with operators which need 
> positional information. Stemmed terms are prefixed with 'Z'.<<<
>
> So, if I have terms with capital letters (as in german, ...) they won't 
> get stemmed, what it is in fact not my desired behaviour ....

You're right, the STEM_SOME scheme isn't ideal for German.  There should
be a way to control whether an initial capital prevents stemming or not
when using STEM_SOME.  Perhaps a STEM_<something else> setting which does
this.  We should also consider renaming STEM_SOME to something clearer if
we're adding another option between the extremes.

> If i lowercase all words I get matches for stemmed terms, but I won't 
> get a match on a phrase search (when capitalized terms occur).

Sorry, I'm not sure what you mean here.

> Is that an issue I have to handle myself, or is there on other 
> opportunity in Xapian.

Not at the moment, but we should add a way, and it's not hard to do.
Could you please file a wishlist bug for this?

Cheers,
    Olly



More information about the Xapian-discuss mailing list