[Xapian-devel] New Idea on Ranking in IR

Parth Gupta parthg.88 at gmail.com
Sun Apr 3 16:27:36 BST 2011

Click-through measurements are certainly good measure for automatic
preparation of training data. But what I have in my mind is if we consider
relevance as a binary variable then For the training data there are many
relevance judgements are available for ad-hoc retrieval task in many good IR
conferences like TREC or FIRE, so we can prepare the feature vectors from
them. It will be a first benchmark for the project guideline. It will be
reliable too because it is human-judged and comprises both the relevant and
non-relevant documents. So an unbiased sample and good for machine learning.

Sure I am certainly very happy to discuss it with you, because thats how I
can convey my idea well, through answering questions.

Also I am very new to the formalities to submit the application for the GSoC
so if the things happen early then I would have enough time to shape the
application considering feedbacks.


On Sun, Apr 3, 2011 at 8:10 PM, Olly Betts <olly at survex.com> wrote:

> On Fri, Apr 01, 2011 at 02:48:28PM +0530, Parth Gupta wrote:
> > In Laarning to Rank (Letor) we prepare the features which can represent a
> > query document pair. So now after the initial retrieval we take say first
> 20
> > or 30 documents and represent them in form of feature vactors, now based
> on
> > the training data our supervised leaning will give a score to each
> document
> > for a particular query. For example if this learning is from regression
> then
> > we have to learn 'W' vector which will give a score to the document
> vector
> > by dot product.
> >
> > Here the features can be term frequency, TF-IDF score, BM25 Score etc, as
> > good as many. For Learning there are many machine learning techniques
> > available.
> What would be your plan for gathering data to train with?  Some sort of
> click-through measurements?
> On Sun, Apr 03, 2011 at 12:37:27PM +0530, Parth Gupta wrote:
> > Please give your feedback on the possibility of exploration of the idea
> so
> > that I can incorporate those things in my application.
> It seems an interesting project to me, though I'm not sure I know enough
> about the are to offer a much in the way of useful insights.  I can
> probably ask some stupid questions though.
> But I'm certainly happy to consider an application from you for working
> on this.
> Cheers,
>     Olly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20110403/b43a20fd/attachment.htm>

More information about the Xapian-devel mailing list