[Xapian-devel] Learning to Rank : GSoC 2012

Mon Apr 2 11:36:36 BST 2012

Hello Ashish,

> This is in reference to "Learning to Rank" Project Idea. [I know, i made
> the entry a bit late, but hope you are still in interest to help out]
> I am looking for suggestions to help me narrowing down the choices of
> algorithms. I had been readily surveying on the referred algorithms for the
> purpose of choosing the right one. I am mentioning here some of my doubts
> to discuss and make my concepts clear about the algorithms, so i should end
> up choosing the most suitable one. I am sure your input would be fruitful
> for me in effectively drafting my proposal.
> The listnet looks like computationally more complex compared to ranknet.
> So , is there any big advantage (in terms of improvement in ndcg/map) to
> move to listnet AND the optimization suggested in the paper to look for top
> one seems too simple. What will be the impact on accuracy and is there any
> way to speed up /optimize listnet?
>

Dont worry about the late entry. Okay if your question is just between
RankNet and ListNet, then I would say considering top k ranks for the
optimization, give your choices to identify the relevant documents than top
2 in RankNet. Yes the concept of ListNet is too simple but still effective.
Moreover, implementing ListNet automatically implements RankNet (if I am
not wrong choosing k=2, makes it RankNet).

> For adarank i didnt understand how is it superior compared to linear
> regression??
> I was also trying to search for open-source package for training listnet
> to save time and focus on more important aspects of library enhancements,
> but didn't get any suitable one. However, FANN is still in my to-check
> list, and meanwhile i was just experimenting to train list-net in octave by
> reusing some of my ml-class code (an online course by Professor Andrew Ng
> that i participated in). What is the quickest way to understand the
> modularity to be-involved while implementing any algorithm to serve the
> current need.
>

Linear Regression, can also be a choice, but it tries to fit the whole
training data as a singular value decomposition (SVD) problem and gives you
a weight vector. In the past it has been compared to other models and it
performs mostly bad with RankBoost and sometimes Adarank. But boosting
based techniques, have performed better on Yahoo dataset. Anyway, the
linear regression can be incorporated but it would alone be insufficient
for a  GSoC project.

Hope this helps. Thanks, Rishabh for the comments on the same.

Regards,
Parth.

>
> Thanks,
>
> Regards,
> Ashish Sadh
> B.Tech, final year student.
> Indian Institute of Information Technology, Allahabad, India.
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120402/2c74c737/attachment-0001.htm>