[Xapian-discuss] about stemming

durga bidaye doubtfire40008 at gmail.com
Tue Apr 4 04:12:05 BST 2006


Hello

I am sorry I started a new thread...Hence i am forwarding the mail, to the
original thread, that is this one. ...Indeed i mean exactly what Perdana
Panduwana says. Can u tell me if it is possible to have a situation where
searching for "footballer" will give the document containing the term
"footballer" a higher ranking and doc contaning "football' a lower
rank.Similarly, on searching for "football", document containing the term
"football" will get a higher ranking and doc containing "footballer" will
get lower rank. This is what i meant by my query. I think Olly misunderstood
my question. Can you answer my question now Olly?

Thanks

Durga


On 4/2/06, Olly Betts <olly at survex.com> wrote:
>
> On Sun, Apr 02, 2006 at 10:27:37AM +0530, durga bidaye wrote:
> > Suppose footballer and footballs were given as terms to be indexed
> > and both were stemmed to footbal. Now when we gave "footballs" as the
> query
> > then we will get both, document containing footballs and document
> containing
> > footballer, as search results with equal ranking(in absence of other
> factors
> > like within document frequency,etc).
>
> Correct.
>
> > But ideally it should have given document containing "footballs"
> > higher ranking and the one containing footballer lower ranking.
>
> I don't follow why.  Both "footballs" and "footballer" indicate that a
> document is "about terms that stem to 'footbal'".
>
> Perhaps you think that "footballer" indicates less "aboutness" than
> "footballs"?  I think that's a highly subjective judgement - it may
> be true sometimes but in other cases the reverse is true.  For example,
> consider the query: footballers' wives - there "footballer" indicates
> more relevance than "footballs".
>
> > Isn't there a mechanism in xapian which makes this kind of ranking
> > possible?
>
> If you really want to do that, you can set a higher "wdfinc" when adding
> postings for "footbal" when it comes from "footballs" than when it comes
> from "footballer".
>
> But you'll need to compile a list of which unstemmed forms indicate more
> "aboutness" than others, and I'm unconvinced it's really a sensible
> approach.
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list