[Xapian-discuss] Phrase Search on Stemmed Data
Olly Betts
olly at survex.com
Wed Jan 9 02:03:40 GMT 2008
On Fri, Jan 04, 2008 at 09:36:23PM +0100, Deniz Dalli wrote:
> in the docs
>
> >>>STEM_SOME: Search for stemmed forms of terms except for those which
> start with a capital letter, or are followed by certain characters
> (currently: (/@<>=*[{" ), or are used with operators which need
> positional information. Stemmed terms are prefixed with 'Z'.<<<
>
> So, if I have terms with capital letters (as in german, ...) they won't
> get stemmed, what it is in fact not my desired behaviour ....
You're right, the STEM_SOME scheme isn't ideal for German. There should
be a way to control whether an initial capital prevents stemming or not
when using STEM_SOME. Perhaps a STEM_<something else> setting which does
this. We should also consider renaming STEM_SOME to something clearer if
we're adding another option between the extremes.
> If i lowercase all words I get matches for stemmed terms, but I won't
> get a match on a phrase search (when capitalized terms occur).
Sorry, I'm not sure what you mean here.
> Is that an issue I have to handle myself, or is there on other
> opportunity in Xapian.
Not at the moment, but we should add a way, and it's not hard to do.
Could you please file a wishlist bug for this?
Cheers,
Olly
More information about the Xapian-discuss
mailing list