[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

Mudit Gupta mudit.raaj.gupta at gmail.com
Mon Mar 25 12:17:15 GMT 2013


Hi Guys,

Thank you for your time and input.

I just saw the updated description of the project. It seems that the GSoC
work can be divided into thee major part. Firstly, writing API for the
Xapian Letor module. Then, writing a Test suite for Xapian-Letor. Finally,
try to incorporate 1-2 feature selection algorithms. I think it would also
be useful if 1-2 detailed examples of API usage are incorporated in the
project.

I will fork Rishab's branch and try to play with the code. I will go
through resources about feature selection algorithm and update on the
mailing list. I haven't looked into Rishab's code in detail, but I am
guessing it might need some more refactoring. Also, I will share the API
design.

Best Regards,

Mudit Raj Gupta

On Mon, Mar 25, 2013 at 5:20 PM, Parth Gupta <pargup8 at gmail.com> wrote:

> Hi Mudit,
>
> As Olly has pointed out, this year we are not planning to build up more or
> on new ranking algorithms. Rather, we will consolidate the project with the
> present ranking algorithms. Rather it would be interesting to incorporate
> some/one feature selection algorithms.  See the Learning to Rank updated
> project description on ideas pages.
>
> If you are interested in working on this project, it would be great start
> to fork Rishabh's branch and debug the code. That would give you much more
> insight of the project and help you better formulate your application.
>
> Regards,
> Parth.
>
> On Fri, Mar 22, 2013 at 6:37 AM, Olly Betts <olly at survex.com> wrote:
>
>> On Thu, Mar 21, 2013 at 06:58:41PM +0530, Mudit Gupta wrote:
>> > I am interested in "Learning To Rank" project.  If I am not wrong, I
>> found
>> > the framework incorporated by Parth in the cloned code. It needed some
>> > refactoring in order to incorporate more algorithms and was done by
>> Rishabh
>> > and available in his git repo (
>> https://github.com/rishabhmehrotra/xapian)
>> > but is still not merged. So, I assume I should think of additions to the
>> > code in Rishbh's repo.
>>
>> Yes, I think that's the best starting point.
>>
>> > Moreover, I noticed that SVM-rank, ListMLE and
>> > ListNet is already present in the code. I am interested in addition of a
>> > random forest approach and looking for appropriate libraries. I would be
>> > great to get input by the Xapian community in terms of preference of
>> > algorithms and open source libraries. It would also be great to know the
>> > priority of the Letor project to the Xapian community.
>>
>> Parth and I talked this over recently, and we concluded that this year a
>> major focus should be on consolidating the existing work.  That doesn't
>> necessarily mean that new features can't be looked at, but one of the
>> deliverables should really be a xapian-letor module which we're happy to
>> tag as a stable release.  A project which adds more algorithms is
>> interesting, but if the end result isn't useful to Xapian users, there's
>> much less benefit to be had from it.
>>
>> One of the major things missing is a testsuite.  Without any automated
>> tests, it's hard to have much confidence that the code works, and it
>> makes it much harder to make changes to the code in the future without
>> introducing new bugs.  So I think adding a testsuite is important.
>> The harness from xapian-core is suitable, but testcases need writing,
>> and the bugs that actually writing testcases will inevitably uncover
>> need fixing.
>>
>> We should also look at what features are missing from xapian-core
>> which would be useful for xapian-letor, and consider implementing them -
>> especially if they have other potential uses.  Two that I'm aware of
>> are:
>>
>> * Fundamentally, xapian-letor wants to take a Xapian::MSet object and
>>   reorder it, so an API which allows that would be handy - then the
>>   output of xapian-letor can be an Xapian::MSet object, allowing it to
>>   be cleanly slotted into existing applications using the Xapian API.
>>   An MSet reordering API also has other potential uses - for example,
>>   clustering results.
>>
>> * Field-related features currently have to be calculated specially by
>>   xapian-letor, but these would also be useful to have for other uses
>>   (e.g. implementing BM25f) so tracking them in the database backend
>>   in xapian-core is worth investigating.
>>
>> I'll update the entry on the project ideas page with the above shortly.
>>
>> Cheers,
>>     Olly
>>
>> _______________________________________________
>> Xapian-devel mailing list
>> Xapian-devel at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130325/098144b2/attachment.htm>


More information about the Xapian-devel mailing list