[Xapian-discuss] Sort By Relevance+Value?

Doug Shore doug.shore at axial.net
Fri Apr 18 15:49:59 BST 2014


Olly Betts <olly <at> survex.com> writes:

> 
> On Sat, Feb 24, 2007 at 02:35:10PM +0800, Robert Young wrote:
> > Pardon me if this has been asked before, is there a way to do a sort by:
> > 
> > (relevance + constant * value)
> > 
> > efficiently? Is this planned or feasible?
> 
> It's not currently possible, but provided you can give an upper bound
> for value, it's possible to implement and should be pretty efficient
> if the bound is reasonable.
> 
> I looked at the idea some time ago, and the experimental "match bias"
> was the result - this is hardwired to add a weight which decays
> exponentially with date (the idea being that for something like a news
> search, more recent articles will tend to be more relevant).
> 
> However, Rusty Conover's patch to add an "ExternalSourcePostList" would
> allow a general implementation of this idea:
> 
> http://thread.gmane.org/gmane.comp.search.xapian.general/4061
> 
> It just needs to gain the ability to return values for get_weight() and
> get_maxweight(), and then you can implement a subclass which indexes
> every document and returns "constant * value" as the weight.  This can
> then be ANDed with the query to obtain the desired result.
> 
> > Incidentally, I believe by a very crude simplification, relevance +
> > pagerank is what Google is using?
> 
> They don't openly document what they use, but it's presumably some
> function of statistical relevance and pagerank.  But other factors may
> be involved, and it might not be a linear combination.
> 
> Cheers,
>     Olly
> 

I am using the python bindings and trying to do something similar to below.

In particular I want to be able to normalize the relevance and 
value scores so I can do a weighted sum for the final document weighting.

Correct me if I am wrong, but subclassing PostingSource will allow me to get 
weights based on  document values, but there is not way to determine a 
multiplier that is scaled based on the relevance weighting?

I am looking at adding an operator that is similar to OP_AND_MAYBE, but does a 
normalized weighted sum rather than a simple addition of the left and right 
posting lists values.

Am I missing some easy subclassing implementation?

Thanks in advance.

Cheers,
Doug





More information about the Xapian-discuss mailing list