[Xapian-discuss] about stemming

durga bidaye doubtfire40008 at gmail.com
Sun Apr 2 05:57:37 BST 2006


Hi

I have a question regarding stemming. We perform stemming on the terms and
then store them as indexes. Now again, while performing a query, we carry
out stemming. Obviously, this means stemming is a lossy process. Consider an
example. Suppose footballer and footballs were given as terms to be indexed
and both were stemmed to footbal. Now when we gave "footballs" as the query
then we will get both, document containing footballs and document containing
footballer, as search results with equal ranking(in absence of other factors
like within document frequency,etc). But ideally it should have given
document containing "footballs" higher ranking and the one containing
footballer lower ranking.Isn't there a mechanism in xapian which makes this
kind of ranking possible? If not, then why isn't there such a mechanism? Is
it because, if such a meachanism is implemented then it will slow down
xapian's search speed?


Thanks

Durga
doubtfire40008 at gmail.com


More information about the Xapian-discuss mailing list