[Xapian-discuss] Stemming non-protein
Peter Masiar
peter.masiar at yale.edu
Wed Mar 29 18:18:45 BST 2006
I was inspired by following exchange, but because topic drifted, I
started new thread:
Olly Betts wrote:
> On Wed, Mar 29, 2006 at 04:14:50PM +0100, James Aylett wrote:
>> Stemming in general is actually harmful.
>
> That's a bit strong.
>
> TREC tests and the like provide a lot of evidence that stemming improves
> retrieval. It's true that it can be harmful in cases when words that are
> unrelated (or not closely related enough) get conflated, but then *NOT*
> conflating words is also harmful in many cases and on balance stemming is
> a win.
I am interested in one special kind of stemming.
Say, my user queries for "protein". Document might say "non-protein".
Will xapian match it? Is it possible to disable such matches?
Sorry I still don't have omega running (reasons explained in my next
email question).
--
Peter Masiar, Yale center for medical Informatics
A: Because it messes up the flow of reading.
Q: Why is top-posting often frowned upon?
More information about the Xapian-discuss
mailing list