[Xapian-discuss] Sort By Relevance+Value?

Olly Betts olly at survex.com
Mon Feb 26 01:25:01 GMT 2007


On Sat, Feb 24, 2007 at 02:35:10PM +0800, Robert Young wrote:
> Pardon me if this has been asked before, is there a way to do a sort by:
> 
> (relevance + constant * value)
> 
> efficiently? Is this planned or feasible?

It's not currently possible, but provided you can give an upper bound
for value, it's possible to implement and should be pretty efficient
if the bound is reasonable.

I looked at the idea some time ago, and the experimental "match bias"
was the result - this is hardwired to add a weight which decays
exponentially with date (the idea being that for something like a news
search, more recent articles will tend to be more relevant).

However, Rusty Conover's patch to add an "ExternalSourcePostList" would
allow a general implementation of this idea:

http://thread.gmane.org/gmane.comp.search.xapian.general/4061

It just needs to gain the ability to return values for get_weight() and
get_maxweight(), and then you can implement a subclass which indexes
every document and returns "constant * value" as the weight.  This can
then be ANDed with the query to obtain the desired result.

> Incidentally, I believe by a very crude simplification, relevance +
> pagerank is what Google is using?

They don't openly document what they use, but it's presumably some
function of statistical relevance and pagerank.  But other factors may
be involved, and it might not be a linear combination.

Cheers,
    Olly



More information about the Xapian-discuss mailing list