[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

Parth Gupta pargup8 at gmail.com
Wed Mar 27 01:05:02 GMT 2013


That sounds good.

Parth.

On Mon, Mar 25, 2013 at 8:17 PM, Mudit Gupta <mudit.raaj.gupta at gmail.com>wrote:

> Hi Guys,
>
> Thank you for your time and input.
>
> I just saw the updated description of the project. It seems that the GSoC
> work can be divided into thee major part. Firstly, writing API for the
> Xapian Letor module. Then, writing a Test suite for Xapian-Letor. Finally,
> try to incorporate 1-2 feature selection algorithms. I think it would also
> be useful if 1-2 detailed examples of API usage are incorporated in the
> project.
>
> I will fork Rishab's branch and try to play with the code. I will go
> through resources about feature selection algorithm and update on the
> mailing list. I haven't looked into Rishab's code in detail, but I am
> guessing it might need some more refactoring. Also, I will share the API
> design.
>
> Best Regards,
>
> Mudit Raj Gupta
>
> On Mon, Mar 25, 2013 at 5:20 PM, Parth Gupta <pargup8 at gmail.com> wrote:
>
>> Hi Mudit,
>>
>> As Olly has pointed out, this year we are not planning to build up more
>> or on new ranking algorithms. Rather, we will consolidate the project with
>> the present ranking algorithms. Rather it would be interesting to
>> incorporate some/one feature selection algorithms.  See the Learning to
>> Rank updated project description on ideas pages.
>>
>> If you are interested in working on this project, it would be great start
>> to fork Rishabh's branch and debug the code. That would give you much more
>> insight of the project and help you better formulate your application.
>>
>> Regards,
>> Parth.
>>
>> On Fri, Mar 22, 2013 at 6:37 AM, Olly Betts <olly at survex.com> wrote:
>>
>>> On Thu, Mar 21, 2013 at 06:58:41PM +0530, Mudit Gupta wrote:
>>> > I am interested in "Learning To Rank" project.  If I am not wrong, I
>>> found
>>> > the framework incorporated by Parth in the cloned code. It needed some
>>> > refactoring in order to incorporate more algorithms and was done by
>>> Rishabh
>>> > and available in his git repo (
>>> https://github.com/rishabhmehrotra/xapian)
>>> > but is still not merged. So, I assume I should think of additions to
>>> the
>>> > code in Rishbh's repo.
>>>
>>> Yes, I think that's the best starting point.
>>>
>>> > Moreover, I noticed that SVM-rank, ListMLE and
>>> > ListNet is already present in the code. I am interested in addition of
>>> a
>>> > random forest approach and looking for appropriate libraries. I would
>>> be
>>> > great to get input by the Xapian community in terms of preference of
>>> > algorithms and open source libraries. It would also be great to know
>>> the
>>> > priority of the Letor project to the Xapian community.
>>>
>>> Parth and I talked this over recently, and we concluded that this year a
>>> major focus should be on consolidating the existing work.  That doesn't
>>> necessarily mean that new features can't be looked at, but one of the
>>> deliverables should really be a xapian-letor module which we're happy to
>>> tag as a stable release.  A project which adds more algorithms is
>>> interesting, but if the end result isn't useful to Xapian users, there's
>>> much less benefit to be had from it.
>>>
>>> One of the major things missing is a testsuite.  Without any automated
>>> tests, it's hard to have much confidence that the code works, and it
>>> makes it much harder to make changes to the code in the future without
>>> introducing new bugs.  So I think adding a testsuite is important.
>>> The harness from xapian-core is suitable, but testcases need writing,
>>> and the bugs that actually writing testcases will inevitably uncover
>>> need fixing.
>>>
>>> We should also look at what features are missing from xapian-core
>>> which would be useful for xapian-letor, and consider implementing them -
>>> especially if they have other potential uses.  Two that I'm aware of
>>> are:
>>>
>>> * Fundamentally, xapian-letor wants to take a Xapian::MSet object and
>>>   reorder it, so an API which allows that would be handy - then the
>>>   output of xapian-letor can be an Xapian::MSet object, allowing it to
>>>   be cleanly slotted into existing applications using the Xapian API.
>>>   An MSet reordering API also has other potential uses - for example,
>>>   clustering results.
>>>
>>> * Field-related features currently have to be calculated specially by
>>>   xapian-letor, but these would also be useful to have for other uses
>>>   (e.g. implementing BM25f) so tracking them in the database backend
>>>   in xapian-core is worth investigating.
>>>
>>> I'll update the entry on the project ideas page with the above shortly.
>>>
>>> Cheers,
>>>     Olly
>>>
>>> _______________________________________________
>>> Xapian-devel mailing list
>>> Xapian-devel at lists.xapian.org
>>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130327/1a78a67d/attachment.htm>


More information about the Xapian-devel mailing list