[Xapian-devel] Learning to rank

Wed Apr 4 06:36:37 BST 2012

Parth & Olly,

I have submitted the proposal at the google-melange site. Please have a
look at it & provide your valuable comments to mold it in a better manner.

regards,

On Sat, Mar 31, 2012 at 12:06 AM, pankaj singhal <pankajsinghal at ieee.org>wrote:

> Parth,
>
> It would be very nice if you could just send me the proposal you made last
> year so that I can refer it for my proposal, if you are OK with that.
>
>
> On Fri, Mar 30, 2012 at 11:39 PM, pankaj singhal <pankajsinghal at ieee.org>wrote:
>
>> Parth,
>>
>> As I am new with the formalities of submitting the application and making
>> a good proposal, I would like you to help me with the feedback of my
>> application so that I would mold it accordingly. Also how am I supposed to
>> show the proposal made.
>>
>> regards
>>
>> On Fri, Mar 30, 2012 at 5:34 PM, Parth Gupta <parthg.88 at gmail.com> wrote:
>>
>>> Hi Pankaj,
>>>
>>> Nice to see that you have chosen the algorithm. Yes, indeed ListMLE
>>> would be a nice choice hence the difference between ListNet and ListMLE is
>>> the loss function. The former mimimises the Cross Entropy while the latter
>>> miminises the likelihood loss.
>>>
>>> It would be better, if you investigate this and try to include in your
>>> proposal.
>>>
>>> Parth.
>>>
>>>  Here is the idea which i want to incorporate and which would be a good
>>>> extension to the LTR project and Xapian.
>>>> I want to implement the algorithm ListMLE[1] on Xapian. The algorithm
>>>> uses listwise approach with Neural Network as Model and gradient descent as
>>>> algorithm(highly optimised Loss function). ListMLE is an extension of
>>>> ListNET[2] which itself is an extension(somewhat) of RankNET[2]. This
>>>> algorithm has shown better performance than the other two.Also the
>>>> algorithm has linear complexity.
>>>>
>>>> Regarding the features for the query-document pair, research has shown
>>>> many good features that can be used for better tuning of the parameters of
>>>> ranking function which can differentiate the documents in a better way.
>>>> These can be calculated using the basic set of features(tf, idf, bm25,
>>>> etc.), the more the better.
>>>>
>>>> Regarding the training data we can use the OHSUMED[4] data-set, a
>>>> benchmark data-set released in LETOR 2.0(Microsoft research), used by the
>>>> developers of the algorithm for the training and testing purposes. This
>>>> data-set is reliable as the relevance degrees of documents with respect to
>>>> the queries are judged by humans. They try to adopt the ‘standard’ features
>>>> proposed in the IR community. The similar kind of features, as used in
>>>> data-set, can be incorporated while implementing the algorithm on Xapian.
>>>>
>>>> Implementing this algorithm would definitely be a good improvement in
>>>> the current LTR project, as it uses listwise approach which is far better
>>>> than the current pointwise approach. Also there are more and better
>>>> features used in OHSUMED dataset which we can use , than the current used
>>>> features.
>>>>
>>>> Please give feedback on the idea and suggest any exploration needed.
>>>>
>>>>
>>>> [1] - http://research.microsoft.com/en-us/people/tyliu/icml-listmle.pdf
>>>> [2] - http://research.microsoft.com/apps/pubs/default.aspx?id=70428
>>>> [3] -
>>>> http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf
>>>> [4] -
>>>> http://research.microsoft.com/en-us/um/beijing/projects/letor//letor-old.aspx
>>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>> On Wed, Mar 28, 2012 at 7:58 PM, Parth Gupta <parthg.88 at gmail.com>wrote:
>>>>
>>>>> Pankaj,
>>>>>
>>>>> FANN looks fine. But in the proposal I would like to see something
>>>>> specific what you plan to do with that. Like implementing the algorithm
>>>>> RankNet, ListNet or something else?
>>>>>
>>>>> Parth.
>>>>>
>>>>>
>>>>> On Wed, Mar 28, 2012 at 6:19 AM, Olly Betts <olly at survex.com> wrote:
>>>>>
>>>>>> On Tue, Mar 27, 2012 at 05:26:45PM +0530, pankaj singhal wrote:
>>>>>> > I have come across these C++ neural-frameworks:
>>>>>> > FANN <http://leenissen.dk/fann/wp/>
>>>>>> > Libann <http://www.nongnu.org/libann/doc/libann_4.html#SEC17>
>>>>>>
>>>>>> Did you check the licences?  Libann's site clearly says it's GPL and
>>>>>> as
>>>>>> I said in the message you replied to, we'd rather not add more GPL
>>>>>> dependencies.
>>>>>>
>>>>>> > I want you to look at the libraries as while incorporating them the
>>>>>> need of
>>>>>> > implementing the ML algo. from the scratch reduces.
>>>>>> > http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>>>>
>>>>>> FANN says it is LGPL, which is probably OK.  I've no idea if it
>>>>>> fulfils
>>>>>> the needs of the project.  Parth may be able to comment more usefully,
>>>>>> but ultimately you'll need to show us in your proposal that the
>>>>>> libraries you're intending to use are suitable, so you'll need to look
>>>>>> into this more deeply yourself.
>>>>>>
>>>>>> Cheers,
>>>>>>     Olly
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pankaj Singhal
>>>> III Year, CSE
>>>> The LNMIIT, Jaipur, India.
>>>>
>>>> Mob: +918875053936
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Pankaj Singhal
>> III Year, CSE
>> The LNMIIT, Jaipur, India.
>>
>> Mob: +918875053936
>>
>>
>>
>>
>
>
> --
> Pankaj Singhal
> III Year, CSE
> The LNMIIT, Jaipur, India.
>
> Mob: +918875053936
>
>
>

-- 
Pankaj Singhal
III Year, CSE
The LNMIIT, Jaipur, India.

Mob: +918875053936
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120404/92fe2773/attachment.htm>