[Xapian-discuss] MSet order

Pierre-Alain Moellic pamoellic at gmail.com
Tue Mar 8 14:29:37 GMT 2011


For the euclidean distance, I used the BM25Weight Class as an example and
simply use the relation between L2 and "cosine" measure.
(Q = query, D = document), ||Q-D||² = ||Q||² + ||D||² - 2 Q.D

A non-linear transformation of the weights is probably a problem since we
mainly define a weighting scheme with the get_sumpart() method.
With Xapian, I understood that the final weight of a document W is W = w1 +
w2 + ... wN, each wi is provided by get_sumpart().
So the transformation w --> 1 / ( 1 + w ) should be applied to the final
weight W but not to the wi. Is it possible to change the final weight in a
MSet ?

pa.



2011/3/8 Olly Betts <olly at survex.com>

> On Tue, Mar 08, 2011 at 01:13:51PM +0000, Richard Boulton wrote:
> > There's no way to do this with the current Xapian matcher.  However,
> > what you can do is transform your weights so that they occur in
> > reverse order (note that they must still remain positive, though).
> >
> > So, one option would be to change your get_sumpart() function to
> > return "max - w", where w is the value you're currently returning, and
> > max is the value returned by get_maxpart().
> >
> > If get_maxpart() isn't returning a tight bound, eg, you've had to
> > return DBL_MAX for it, you might want to do a non-linear transform on
> > your data to squash it into a more reasonable range - though that
> > doesn't sound like it'll be a particular issue for you.
>
> I'd suggest transforming the weights via 1 / (1 + w)
>
> Then the best weight of 0 becomes 1, and larger (less good weights)
> head towards zero.  And get_maxpart() would just return 1.
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list