[Xapian-discuss] Lucene ranking

James Aylett james-xapian at tartarus.org
Fri Oct 29 10:30:25 BST 2004


On Thu, Oct 28, 2004 at 08:08:06PM +0100, Olly Betts wrote:

> It does sound like he's encountered it in real world use.  I was just
> saying it's hard to reason reliably based on this example.  Anyway,
> Xapian seems to get this right and fixing Lucene is somebody else's
> problem!

:-)

> > Having now looked at the BM25 documentation again, and almost
> > understood it (:-), I think I see what's going on here. (I just tried
> > fiddling with the constructor parameters of Xapian::BM25Weight to no
> > avail - this was through wrappers, which may be something to do with
> > the fact that much of the important bits of this class including the
> > constructor are inline.
> 
> Why should that cause problems with the bindings?

That's just a wild stab in the dark - I'm assuming that
Xapian::Enquire::set_weighting_scheme() does actually work, and that
the variables in Xapian::BM25Weight are used; given that, I was
looking for any obvious changes between this class and others that
I've wrapped successfully, specifically something that might cause the
problems I was seeing. I'll try to investigate further and get them
wrapped at some point - I've just realised that BoolWeight would be
useful for something at work (which is a slight abuse of Xapian, but I
can't think of anything off-the-shelf that would be more
appropriate).

> > How often are we constructing these things that inline constructors
> > are needed?)
> 
> Not especially often, but since one constructor simply initialises to
> fixed values and the other clips parameters to valid ranges and
> initialises members with them, they're good candidates for inlining.
> The range checks will disappear if you initialise with constant values,
> which is a common case.

True. I was just wondering, if it turns out that SWIG doesn't play
well with inlined constructors (can't think why it wouldn't, and
there's no explicit mention in the manual).

> The other 3 inlined methods are virtual, so there's probably little
> point having them in the header, since the object will almost always
> be used as a Weight rather than a BM25Weight once it is constructed.
> So the compiler won't ever actually be able to inline them.

Right.

> > (b) making it obvious how we get from BM25
> > as we document it to the formula that Xapian::BM25Weight
> > implements. After staring at bm25.html for about ten minutes I've
> > finally figured out that it is actually telling me that BM25 /is/ what
> > we're using (with E=1), but with the BM11 term frobbed so that it
> > doesn't disappear on L=1. That could be a little clearer :-)
> 
> IIRC, the formula is adjusted by a constant factor to make sure
> something is never negative.  But yes, that should be documented.

The BM11 term is effectively divided by (1-L), so that makes
sense.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list