Letor: returning MSet after re-ranking

Olly Betts olly at survex.com
Thu Aug 4 06:32:36 BST 2016


On Sun, Jul 31, 2016 at 05:43:56PM +0100, James Aylett wrote:
> On Sun, Jul 31, 2016 at 12:44:16AM +0100, Olly Betts wrote:
> 
> > Would a method which swapped two elements of an MSet provide what you
> > need?  That would provide a more generic way to adjust the ranking of
> > an MSet which for example could be used to implement a diversification
> > feature or something like SQL "GROUP BY".
> 
> Isn't the most common use going to be that the client (letor or
> whatever) knows what documents are in what order, possibly with what
> new weights?

That's probably the case for letor - I don't think it would be for a
"GROUP BY" feature.  Not sure about diversification - there are a
lot of different approaches to that from what I've seen.

> I'd have thought people will end up writing the same boilerplate code
> if they have only swap / set weight options, but maybe I'm not seeing
> something.

It may not be the perfect API, but it at least offers more flexibility
that just passing some new weights.  It is quite a primitive operation,
but that makes it a good building block too.

Say you have the suggested method:

rerank_mset(vector<double> updated_weights)

(Really that should be passed by const reference, but never mind that
for now.)

Then you couldn't implement the GROUP BY feature - you'd have to change
the weights of the entries you want to move to group with others (which
may not be desirable, and there may not be a suitable weight to assign
as there may be adjacent entries to the target one with the same
weight - in that case you'd have to arbitrarily change the weights of
documents you aren't even moving).  And what about GROUP BY on an MSet
which wasn't sorted by weight in the first place?  You'd have to make
up entirely fictional weights which match the sorted order.

And what if you don't have the weights as vector<double>?  People will
have to write boilerplate code to construct a suitable vector<double>.

Providing a new list of weights to re-sort the MSet by is also an
inherently O(n*log(n)) operation, whereas providing the more primitive
swap operation may have a better complexity.  Not an issue for n=10,
but reranking will often work better on a fairly large input set.

Cheers,
    Olly



More information about the Xapian-devel mailing list