[Xapian-discuss] Stemming non-protein

Philip Neustrom philipn at gmail.com
Thu Mar 30 11:45:09 BST 2006


It's also difficult to know whether or not to take "-" to designate
two seperate terms or a single term.  In this case you want a single
term, but there are other cases where a user may want to find a term
in a string-like-this.

On 3/30/06, James Aylett <james-xapian at tartarus.org> wrote:
> On Wed, Mar 29, 2006 at 12:18:45PM -0500, Peter Masiar wrote:
>
> > Say, my user queries for "protein". Document might say "non-protein".
> > Will xapian match it? Is it possible to disable such matches?
>
> Currently (I believe - Olly may need to correct me) what will happen
> is that both "non" and "protein" will be generated as terms (well,
> they'll be stemmed too), but someone searching for "non-protein" will
> generate a PHRASE search "non" PHRASE(n) "protein" where n is
> something appropriate (probably 2?).
>
> So searching for "protein" will find anything containing
> "non-protein", which isn't always what you want. (Probably isn't very
> often what you want.)
>
> What you probably would need if you wanted to avoid this would be to
> generate "non-protein" as a term. ("protein" stemmed is still
> "protein" in our English stemmer.)
>
> J
>
> --
> /--------------------------------------------------------------------------\
>   James Aylett                                                  xapian.org
>   james at tartarus.org                               uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list