[Xapian-devel] New Idea on Ranking in IR

Olly Betts olly at survex.com
Mon Apr 4 03:46:47 BST 2011


On Sun, Apr 03, 2011 at 08:57:36PM +0530, Parth Gupta wrote:
> Click-through measurements are certainly good measure for automatic
> preparation of training data. But what I have in my mind is if we consider
> relevance as a binary variable then For the training data there are many
> relevance judgements are available for ad-hoc retrieval task in many good IR
> conferences like TREC or FIRE, so we can prepare the feature vectors from
> them. It will be a first benchmark for the project guideline. It will be
> reliable too because it is human-judged and comprises both the relevant and
> non-relevant documents. So an unbiased sample and good for machine learning.

That's OK for developing this.  But it seems likely that training in one
domain won't transfer reliably to another, so someone developing a
search which uses this will really need their own training data.

So for a developer wanting to deploy this, being able to automatically
crowd-source my training data by tracking clicks on search results is
much more appealing than having to invest time and/or money in getting
relevance judgement produced specially.  Click data also allows training
to be a more continuous process, which is beneficial for sites where
topics evolve fairly quickly with time (like news sites).

The click data is almost certainly going to be noisier, which might be
an issue for training, but for a busy site you can easily produce much
more of it than you can with explicit relevance judgements, so perhaps
the noise can be filtered out if it is an issue.

> Also I am very new to the formalities to submit the application for the GSoC
> so if the things happen early then I would have enough time to shape the
> application considering feedbacks.

The formalities are that you need to file an application here before
1900UTC on April 8th:

http://socghop.appspot.com/gsoc/org/google/gsoc2011/xapian

But it's a good idea to get your application in sooner than that to
give us a chance to review it and make comments.  There's also likely to
be a surge in proposals as the deadline nears.  You're able to make
changes up until the deadline.

Cheers,
    Olly



More information about the Xapian-devel mailing list