[Xapian-discuss] Stemming non-protein

Peter Masiar peter.masiar at yale.edu
Wed Mar 29 18:18:45 BST 2006


I was inspired by following exchange, but because topic drifted, I 
started new thread:

Olly Betts wrote:
 > On Wed, Mar 29, 2006 at 04:14:50PM +0100, James Aylett wrote:
 >> Stemming in general is actually harmful.
 >
 > That's a bit strong.
 >
 > TREC tests and the like provide a lot of evidence that stemming improves
 > retrieval.  It's true that it can be harmful in cases when words that are
 > unrelated (or not closely related enough) get conflated, but then *NOT*
 > conflating words is also harmful in many cases and on balance stemming is
 > a win.

I am interested in one special kind of stemming.

Say, my user queries for "protein". Document might say "non-protein". 
Will xapian match it? Is it possible to disable such matches?

Sorry I still don't have omega running (reasons explained in my next 
email question).

-- 
Peter Masiar, Yale center for medical Informatics

A: Because it messes up the flow of reading.
Q: Why is top-posting often frowned upon?




More information about the Xapian-discuss mailing list