GSoC 2016 Letor Stabilisation

Parth Gupta pargup8 at gmail.com
Mon Mar 21 17:59:22 GMT 2016


Hi Ayush,

On top of what James has to say. I would recommend to focus first on
VcamX's branch as he was working on API streamlining while v-hasu was
implementing additional ranking algorithms. So have a look at it and just
realign your thoughts while working on the proposal. He already tried to
refactor questletor.cc into more independent tasks such as
letor-prepare.cc, letor-train.cc etc.

I have tried to give it a go to merge VcamX's master with xapian master and
it lies here: https://github.com/parthg/xapian

Most of the conflicts are resolved except "MSet" related parts in enquire.h

You can play with it if you get time, it would definitely give you more
insight into the current code-base.

Cheers
Parth



On Sun, Mar 20, 2016 at 7:32 PM, James Aylett <james-xapian at tartarus.org>
wrote:

> On Sun, Mar 20, 2016 at 05:31:37PM +0530, Ayush Tomar wrote:
>
> > I'm Ayush from New Delhi, India. I am interested in Letor Stabilisation
> > project for GSoC. I have a good background in machine learning. Sorry for
> > getting in so late, university exams were holding me back. I'll try to
> > cover as much as I can in the coming week.
>
> Hi, Ayush. Welcome to Xapian!
>
> > 1. Modifying xapian-letor/bin/questletor.cc to use and test core features
> > and API of letor. The current version of questletor.cc has a lot of
> > unusable and broken functions and is custom made for training with INEX
> > 2010 dataset. The intention is to make it usable for a user provided
> > database. Currently I am using xapian-docsprint/data/100-objects-v1.csv
> as
> > my database and some manually written queries and qrels to make things
> > work.
>
> That's helpful; I haven't looked at questletor in a while. I'm not
> surprised the master version doesn't work, because (as noted in the
> project) there's code that we couldn't merge for licensing reasons.
>
> Note that where the project talks about tests, we mean automated
> tests, probably unit tests. It's worth looking at how xapian-core does
> these, because we'd expect a similar approach for xapian-letor. (I
> think you're already clear on that, but I wanted to make sure!)
>
> > 2. Going through v-hasu's GSoC 2014 code to understand extra
> > functionalities added by him and planning how to introduce code from his
> > branch.
>
> Good.
>
> > 1. Creating a code example that lets the user use 100-objects-v1.csv as
> the
> > database and use Letor features and API to make queries over it.
> > Documenting how to make this example run.
>
> Note again that master probably won't be sufficient to do this. The
> missing functionality (ie the unmerged work) was rewritten on v-hasu's
> (Hanxiao Sun) branch, so can be pulled from there to form the base.
>
> > 3. Writing API and unit tests
>
> Note as the project description states that these should be done
> alongside integrating work, rather than considered separately.
>
> > I have some question:
> >
> > 1. Is the procedure I mentioned above the right way to go about it? What
> > are the essential portions (in terms of code) that I should complete
> before
> > submitting the proposal?
>
> It's not essential to complete any code ahead of the proposal, and as
> you have only a week now to do the proposal that needs to be your
> focus. Working with the code, however, is important to understand what
> work needs to done (and so will inform your proposal). So it's not
> necessary to be able to submit pull requests yet, but the work you've
> been doing in getting familiar with what code is there will form the
> basis of your proposal.
>
> > 2. How can I create the test harness for xapian-letor similar to
> > xapian-core and start writing tests? Tests seem somewhat overwhelming to
> me
> > at the moment, it would be helpful if I could get some assistance on how
> to
> > go about it.
>
> You'll need to copy the test harness. What I'd do is to copy the whole
> of the xapian-core/tests directory, then cut out all the actual
> tests. What's left should be the harness and supporting code. (You'll
> need to write some more support to
>
> > 3. How important is writing new features for this project (for instance
> > implementing LambdaMART ranking)? Should I focus on them as well in my
> > proposal?
>
> Not at all. There's more than enough work in stabilising and
> integrating previous work, writing tests and documentation, and
> creating a fully-working system suitable for general use. If you were
> to integrate all of v-hasu's branch and get that merged, then there's
> VcamX's (Jiarong Wei) work to look at from 2014, although that would
> require some more planning at the time (I wouldn't plan for that in
> your proposal).
>
> J
>
> --
>   James Aylett, occasional trouble-maker
>   xapian.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160321/496f875b/attachment.html>


More information about the Xapian-devel mailing list