[Xapian-discuss] about stemming

Perdana Panduwana panduwana at gmail.com
Mon Apr 3 02:08:21 BST 2006


I think what Durga means is:
- When I search for "footballer", footballer and footballs have equal
weighting (due to stemming)
- Also when I search for "footballs", the same thing applies: footballer and

footballs have equal weighting
- Is it possible that when I search for "footballer", footballer get more
weighting than footballs, and when I search for "footballs", footballs get
more weighting than footballer? It seems impossible because both words will
be stemmed to footbal, so is there any setting to make this possible?

CMIIW

--
Perdana Panduwana
panduwana at gmail.com



On 4/2/06, Olly Betts <olly at survex.com> wrote:
>
> On Sun, Apr 02, 2006 at 10:27:37AM +0530, durga bidaye wrote:
> > Suppose footballer and footballs were given as terms to be indexed
> > and both were stemmed to footbal. Now when we gave "footballs" as the
> query
> > then we will get both, document containing footballs and document
> containing
> > footballer, as search results with equal ranking(in absence of other
> factors
> > like within document frequency,etc).
>
> Correct.
>
> > But ideally it should have given document containing "footballs"
> > higher ranking and the one containing footballer lower ranking.
>
> I don't follow why.  Both "footballs" and "footballer" indicate that a
> document is "about terms that stem to 'footbal'".
>
> Perhaps you think that "footballer" indicates less "aboutness" than
> "footballs"?  I think that's a highly subjective judgement - it may
> be true sometimes but in other cases the reverse is true.  For example,
> consider the query: footballers' wives - there "footballer" indicates
> more relevance than "footballs".
>
> > Isn't there a mechanism in xapian which makes this kind of ranking
> > possible?
>
> If you really want to do that, you can set a higher "wdfinc" when adding
> postings for "footbal" when it comes from "footballs" than when it comes
> from "footballer".
>
> But you'll need to compile a list of which unstemmed forms indicate more
> "aboutness" than others, and I'm unconvinced it's really a sensible
> approach.
>
> Cheers,
>     Olly
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>


More information about the Xapian-discuss mailing list