[Xapian-devel] Custom weight factors - pushing the relevancy ranking how we want it

Olly Betts olly at survex.com
Fri Dec 17 10:37:12 GMT 2004


On Fri, Dec 17, 2004 at 11:06:41AM +0100, Michiel Roding wrote:
> As forums are, the content that is relevant to a search is not just 
> determined by the frequency or location of the terms; the date the topic 
> has been last modified is important as well.

The match bias code will probably be useful here.  I need to tidy up the
UI, which is slowly bubbling up my todo list.  But it'll allow a date
dependent extra weight term so more recent topics can get a boost.

> Another issue we find is that the amount of results is so overwhelming, 
> the user is unable to find the correct topic for his needs. Combining 
> this with some statistics, we found that a very large part of the 
> queries to Omega are the same. Keywords like windows, xp, dvd etc. are 
> very popular.
> Therefore, we are contemplating to build a "does this topic meet your 
> search?" feature to store which topics are most relevant to the queries 
> as defined by the users.

One problem with this approach is that different users may want
different results for the same query.  Some searching for "xp" may want
windows xp, others extreme programming.
    
Hopefully you'll end up with a small enough set of favoured results that
this won't be too much of a problem though.

And if you built a second Xapian database where each topic is indexed
only by terms which people have voted for, then you could use topterms
to allow users to narrow in on particular meanings.

> Other features could be a lame attempt at the PageRank relevancy, 
> storing if a user almost immediatly skips a topic (irrelevant) etc.
> 
> But, this needs to be stored (easy) and processed by Xapian in the sorting.
> 
> How could we go about this? Does Xapian somehow support these custom 
> weight factors?

The match bias code again.  Currently it's hardwired to expect a Unix
timestamp and to give an exponentially decaying weight from the present.
But the concept is that it could be used for all sorts of things,
including this.

Cheers,
    Olly




More information about the Xapian-devel mailing list