[Xapian-devel] [xapian] GSoC - Learning to Rank, Introduction and some Ideas

Sat Mar 31 16:52:55 BST 2012

Hi Mudit,

Please do not mail me privately, use the mailing list.

It seems quite similar to random forest algorithms, in fact if we remove
> the step on select boot strap samples of data, both are almost the same,
> like they select random subset of variables, split node based on threshold
> criteria and mode data. We will also get all those variable importance
> matrix and even the proximity matrix I guess. I think I will make my
> proposal on the same.
>

cool.

>
> I was going through the xapian source and also the Letor tool which was
> added to xapian. I read the documentation. What I could figure is that I
> can use those functions already implemented in xapian to construct my
> feature vector from the data set and since the present implementation uses
> a SVM based ML method. I have to feed the feature vector to the new Random
> Forest Algorithm and I will get ranking of pages. One more thing the input
> vector format with be a pair of document and features(many). Am I right?
>

Right, the current input format is listed at the LTR project page

http://trac.xapian.org/wiki/GSoC2011/LTR

and you can use the same features. You basically need to make the framework
which  can take the data in that format and return the ranked list, quite
similar to the existing approach. If your approach is pairwise/listwise you
also need to construct the pairs out of the training data.

Parth.

>
> Hope to hear from you soon.
>
> Best Regards,
>
> Mudit Raj Gupta
>
> On Fri, Mar 30, 2012 at 5:28 PM, Parth Gupta <parthg.88 at gmail.com> wrote:
>
>>
>>
>>> Thank you for your reply. I am more inclined towards random forest for
>>> ranking. I was planning to complete my proposal soon. Should I include a
>>> literature survey of various algorithms in my proposal or should I choose
>>> one and concentrate on details?
>>>
>>
>> Well the concrete plan of the project will be necessary. So you should
>> focus more on the algorithm which you plan to implement and how are you
>> planning to go about it.
>>
>> Parth.
>>
>>>
>>> Best,
>>>
>>> Mudit
>>>
>>>
>>> On Fri, Mar 30, 2012 at 4:48 PM, Parth Gupta <parthg.88 at gmail.com>wrote:
>>>
>>>> Hi Mudit,
>>>>
>>>> Good to know about you.
>>>>
>>>>>
>>>>> I successfully completed my *Google Summer of Code - 2011* for the *Center
>>>>> for the study of Complex systems - University of Michigan*. I
>>>>> implemented various *algorithms (ant colony, random walk etc.)* related
>>>>> to computational intelligence in Repast S (*Coded in Groovy, Java*)
>>>>> and wrote *extensive documentations and tutorial* for the related
>>>>> models with *literature reviews* on the topics. My *contributions to
>>>>> Repast S was a part of the latest release of the software*. The
>>>>> detailed documentation and code can be found here:
>>>>> http://code.google.com/p/cscs-repast-demos/wiki/Mudit I have also
>>>>> worked on various projects related to implementation of Machine Learning
>>>>> and Bio-Inspired Evolutionary Algorithms.You can check the code and some
>>>>> documentation on the same on code.google.com my user profile is :
>>>>> http://code.google.com/u/110675325175605367090/
>>>>>
>>>>
>>>> Seems your previous experience with machine learning will help you.
>>>>
>>>>
>>>>>
>>>>> I am interested in applying for the project - *"Learning to Rank*". I
>>>>> have read the pointers on the ideas page and some literature about it. I
>>>>> was thinking, based on my literature review, that something on the lines on
>>>>> Multi-layer Perceptron network with Ant Colony Optimization or an Improved
>>>>> random Forest could be a good option. I selected the same because of my
>>>>> experience on the topic. Although any further details/pointers to the
>>>>> projects would be greatly appreciated. I would also like to request you to
>>>>> please let me know about any specific detail related to the project that is
>>>>> required in the proposal (apart from the ones mentioned on the page)
>>>>>
>>>>
>>>> There have been plenty of algorithms proposed in the recent past. Based
>>>> on your choice of ML technique, you can choose one. As you are interested
>>>> in Neural Net based approaches, ListNet [1], RankNet [2], ListMLE [3],
>>>> LamdaRank [4] can be of your interest and if you want to explore the random
>>>> forests based approaches then [5] can be checked out.
>>>>
>>>> [1] Learning to rank: from pairwise approach to listwise approach
>>>> [2] Learning to Rank using Gradient Descent
>>>> [3] Listwise Approach to Learning to Rank - Theory and Algorithm
>>>> [4] Learning to Rank with Nonsmooth Cost Functions
>>>> [5] Learning to rank with extremely randomized trees
>>>>
>>>>
>>>> Regards,
>>>> Parth.
>>>>
>>>>
>>>>
>>>>> Thank you for your time. Hope to hear from you soon.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Mudit Raj Gupta
>>>>>
>>>>> _______________________________________________
>>>>> Xapian-devel mailing list
>>>>> Xapian-devel at lists.xapian.org
>>>>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120331/20a57952/attachment.htm>