How to get the serialise score returned in Xapian::KeyMaker->operator().

Olly Betts olly at survex.com
Wed Jan 24 02:46:35 GMT 2018


On Tue, Jan 23, 2018 at 12:55:31AM +0800, 张少华 wrote:
> We realise our score function using PostingSource instead of using
> KeyMaker,  we reference your python example and source code of xapian,
> the simple demo  is here.
> https://github.com/xiangqianzsh/xapian_leaning/blob/master/postingsource/ExternalWeightPostingSource.h

I'd just put the get_weight() and get_maxweight() implementations into
your ExternalWeightPostingSource class - the WeightSource class doesn't
seem to serve a useful purpose and just adds virtual method call
overheads (and those can add up, unless the compiler can devirtualise
the calls, which compilers are getting better at).

In the python example, the WeightSource class is just meant to be a
placeholder for "some source of weights" - it isn't meant to be a
literal recommendation for how to write such a class.

> But we found that using PostingSource is more slower than KeyMaker.

What's the relative speed difference you're seeing?

> I think the reason maybe: We only use one Xapian::Query of
> PostingSource and the upper bound of our get_weight() can not work on
> a single PostingSource.  So some optimizing  don't work, but waste
> time oppositely.  How do you think about this?

If I follow, you're saying your query is just this an
ExternalWeightPostingSource object?

If so, what is the query in the KeyMaker case?

I'd expect a KeyMaker to also be fairly slow if the query is
Query::MatchAll or similar as the sort key will need building for every
matching document, like how the PostingSource will need to calculate the
weight for every matching document.

> Also, We found the BM25 algorithm is fast in xapian, so we think if we
> can modify our get_weight() function to adjust the BM25 algorithm. If
> so, the type of termfreq of document should be double. I am wondering
> if it works just re-typedef Xapian::termcount to double? Does it has a
> negative impact on other place of xapian source.

It'll stop it compiling, which is fairly negative.  Xapian::termcount
needs to be an unsigned integer, and there are assertions to that effect
you'd hit.  I'd think it would be a significant project to change that.

Implementing your weighting as a Xapian::Weight subclass is a potential
option if it works as a sum of weight components from the terms in the
query.  But if you need to make Xapian::termcount a floating point type
to do it then I suspect this isn't a good approach.

Cheers,
    Olly



More information about the Xapian-discuss mailing list