GSoC 2016 Letor Stabilisation

Ayush Tomar ayushtomar at gmail.com
Tue Mar 22 05:17:27 GMT 2016


Hello Parth,

Thanks for your suggestion. I'll try to submit the first proposal draft by
tomorrow.

Following yours and James' suggestions, these are the broader points around
which my proposal is going to be structured for each milestone in timeline:

1. Merge code from v-hasu's and VcamX's letor implementations
2. Write Doxygen API documentations
3. Write topic documentations inspired from xapian-core/docs explaining
individual core concepts like ListMLE, svmrank, ListNet etc. (My prior
knowledge of machine learning would be useful here)
4. Set up test harness for letor. Write unit and API tests for introduced
code and check coverage
5. Write practical code examples for Getting Started guide

I'll try to commit some code immediately after submitting my final
proposal with above things at least set up.

Regards,
Ayush

On Mon, Mar 21, 2016 at 11:29 PM, Parth Gupta <pargup8 at gmail.com> wrote:

> Hi Ayush,
>
> On top of what James has to say. I would recommend to focus first on
> VcamX's branch as he was working on API streamlining while v-hasu was
> implementing additional ranking algorithms. So have a look at it and just
> realign your thoughts while working on the proposal. He already tried to
> refactor questletor.cc into more independent tasks such as
> letor-prepare.cc, letor-train.cc etc.
>
> I have tried to give it a go to merge VcamX's master with xapian master
> and it lies here: https://github.com/parthg/xapian
>
> Most of the conflicts are resolved except "MSet" related parts in
> enquire.h
>
> You can play with it if you get time, it would definitely give you more
> insight into the current code-base.
>
> Cheers
> Parth
>
>
>
> On Sun, Mar 20, 2016 at 7:32 PM, James Aylett <james-xapian at tartarus.org>
> wrote:
>
>> On Sun, Mar 20, 2016 at 05:31:37PM +0530, Ayush Tomar wrote:
>>
>> > I'm Ayush from New Delhi, India. I am interested in Letor Stabilisation
>> > project for GSoC. I have a good background in machine learning. Sorry
>> for
>> > getting in so late, university exams were holding me back. I'll try to
>> > cover as much as I can in the coming week.
>>
>> Hi, Ayush. Welcome to Xapian!
>>
>> > 1. Modifying xapian-letor/bin/questletor.cc to use and test core
>> features
>> > and API of letor. The current version of questletor.cc has a lot of
>> > unusable and broken functions and is custom made for training with INEX
>> > 2010 dataset. The intention is to make it usable for a user provided
>> > database. Currently I am using xapian-docsprint/data/100-objects-v1.csv
>> as
>> > my database and some manually written queries and qrels to make things
>> > work.
>>
>> That's helpful; I haven't looked at questletor in a while. I'm not
>> surprised the master version doesn't work, because (as noted in the
>> project) there's code that we couldn't merge for licensing reasons.
>>
>> Note that where the project talks about tests, we mean automated
>> tests, probably unit tests. It's worth looking at how xapian-core does
>> these, because we'd expect a similar approach for xapian-letor. (I
>> think you're already clear on that, but I wanted to make sure!)
>>
>> > 2. Going through v-hasu's GSoC 2014 code to understand extra
>> > functionalities added by him and planning how to introduce code from his
>> > branch.
>>
>> Good.
>>
>> > 1. Creating a code example that lets the user use 100-objects-v1.csv as
>> the
>> > database and use Letor features and API to make queries over it.
>> > Documenting how to make this example run.
>>
>> Note again that master probably won't be sufficient to do this. The
>> missing functionality (ie the unmerged work) was rewritten on v-hasu's
>> (Hanxiao Sun) branch, so can be pulled from there to form the base.
>>
>> > 3. Writing API and unit tests
>>
>> Note as the project description states that these should be done
>> alongside integrating work, rather than considered separately.
>>
>> > I have some question:
>> >
>> > 1. Is the procedure I mentioned above the right way to go about it? What
>> > are the essential portions (in terms of code) that I should complete
>> before
>> > submitting the proposal?
>>
>> It's not essential to complete any code ahead of the proposal, and as
>> you have only a week now to do the proposal that needs to be your
>> focus. Working with the code, however, is important to understand what
>> work needs to done (and so will inform your proposal). So it's not
>> necessary to be able to submit pull requests yet, but the work you've
>> been doing in getting familiar with what code is there will form the
>> basis of your proposal.
>>
>> > 2. How can I create the test harness for xapian-letor similar to
>> > xapian-core and start writing tests? Tests seem somewhat overwhelming
>> to me
>> > at the moment, it would be helpful if I could get some assistance on
>> how to
>> > go about it.
>>
>> You'll need to copy the test harness. What I'd do is to copy the whole
>> of the xapian-core/tests directory, then cut out all the actual
>> tests. What's left should be the harness and supporting code. (You'll
>> need to write some more support to
>>
>> > 3. How important is writing new features for this project (for instance
>> > implementing LambdaMART ranking)? Should I focus on them as well in my
>> > proposal?
>>
>> Not at all. There's more than enough work in stabilising and
>> integrating previous work, writing tests and documentation, and
>> creating a fully-working system suitable for general use. If you were
>> to integrate all of v-hasu's branch and get that merged, then there's
>> VcamX's (Jiarong Wei) work to look at from 2014, although that would
>> require some more planning at the time (I wouldn't plan for that in
>> your proposal).
>>
>> J
>>
>> --
>>   James Aylett, occasional trouble-maker
>>   xapian.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160322/72658b1c/attachment-0001.html>


More information about the Xapian-devel mailing list