[Xapian-devel] Learning to rank

pankaj singhal pankajsinghal at ieee.org
Fri Mar 30 19:09:36 BST 2012


Parth,

As I am new with the formalities of submitting the application and making a
good proposal, I would like you to help me with the feedback of my
application so that I would mold it accordingly. Also how am I supposed to
show the proposal made.

regards

On Fri, Mar 30, 2012 at 5:34 PM, Parth Gupta <parthg.88 at gmail.com> wrote:

> Hi Pankaj,
>
> Nice to see that you have chosen the algorithm. Yes, indeed ListMLE would
> be a nice choice hence the difference between ListNet and ListMLE is the
> loss function. The former mimimises the Cross Entropy while the latter
> miminises the likelihood loss.
>
> It would be better, if you investigate this and try to include in your
> proposal.
>
> Parth.
>
>  Here is the idea which i want to incorporate and which would be a good
>> extension to the LTR project and Xapian.
>> I want to implement the algorithm ListMLE[1] on Xapian. The algorithm
>> uses listwise approach with Neural Network as Model and gradient descent as
>> algorithm(highly optimised Loss function). ListMLE is an extension of
>> ListNET[2] which itself is an extension(somewhat) of RankNET[2]. This
>> algorithm has shown better performance than the other two.Also the
>> algorithm has linear complexity.
>>
>> Regarding the features for the query-document pair, research has shown
>> many good features that can be used for better tuning of the parameters of
>> ranking function which can differentiate the documents in a better way.
>> These can be calculated using the basic set of features(tf, idf, bm25,
>> etc.), the more the better.
>>
>> Regarding the training data we can use the OHSUMED[4] data-set, a
>> benchmark data-set released in LETOR 2.0(Microsoft research), used by the
>> developers of the algorithm for the training and testing purposes. This
>> data-set is reliable as the relevance degrees of documents with respect to
>> the queries are judged by humans. They try to adopt the ‘standard’ features
>> proposed in the IR community. The similar kind of features, as used in
>> data-set, can be incorporated while implementing the algorithm on Xapian.
>>
>> Implementing this algorithm would definitely be a good improvement in the
>> current LTR project, as it uses listwise approach which is far better than
>> the current pointwise approach. Also there are more and better features
>> used in OHSUMED dataset which we can use , than the current used features.
>>
>> Please give feedback on the idea and suggest any exploration needed.
>>
>>
>> [1] - http://research.microsoft.com/en-us/people/tyliu/icml-listmle.pdf
>> [2] - http://research.microsoft.com/apps/pubs/default.aspx?id=70428
>> [3] -
>> http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf
>> [4] -
>> http://research.microsoft.com/en-us/um/beijing/projects/letor//letor-old.aspx
>>
>>
>> regards,
>>
>>
>> On Wed, Mar 28, 2012 at 7:58 PM, Parth Gupta <parthg.88 at gmail.com> wrote:
>>
>>> Pankaj,
>>>
>>> FANN looks fine. But in the proposal I would like to see something
>>> specific what you plan to do with that. Like implementing the algorithm
>>> RankNet, ListNet or something else?
>>>
>>> Parth.
>>>
>>>
>>> On Wed, Mar 28, 2012 at 6:19 AM, Olly Betts <olly at survex.com> wrote:
>>>
>>>> On Tue, Mar 27, 2012 at 05:26:45PM +0530, pankaj singhal wrote:
>>>> > I have come across these C++ neural-frameworks:
>>>> > FANN <http://leenissen.dk/fann/wp/>
>>>> > Libann <http://www.nongnu.org/libann/doc/libann_4.html#SEC17>
>>>>
>>>> Did you check the licences?  Libann's site clearly says it's GPL and as
>>>> I said in the message you replied to, we'd rather not add more GPL
>>>> dependencies.
>>>>
>>>> > I want you to look at the libraries as while incorporating them the
>>>> need of
>>>> > implementing the ML algo. from the scratch reduces.
>>>> > http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>>
>>>> FANN says it is LGPL, which is probably OK.  I've no idea if it fulfils
>>>> the needs of the project.  Parth may be able to comment more usefully,
>>>> but ultimately you'll need to show us in your proposal that the
>>>> libraries you're intending to use are suitable, so you'll need to look
>>>> into this more deeply yourself.
>>>>
>>>> Cheers,
>>>>     Olly
>>>>
>>>
>>>
>>
>>
>> --
>> Pankaj Singhal
>> III Year, CSE
>> The LNMIIT, Jaipur, India.
>>
>> Mob: +918875053936
>>
>>
>>
>>
>


-- 
Pankaj Singhal
III Year, CSE
The LNMIIT, Jaipur, India.

Mob: +918875053936
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120330/a5dab8fa/attachment.htm>


More information about the Xapian-devel mailing list