<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">Hi Ayush,<br><br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">On top of what James has to say. I would recommend to focus first on VcamX's branch as he was working on API streamlining while v-hasu was implementing additional ranking algorithms. So have a look at it and just realign your thoughts while working on the proposal. He already tried to refactor questletor.cc into more independent tasks such as letor-prepare.cc, letor-train.cc etc.<br><br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">I have tried to give it a go to merge VcamX's master with xapian master and it lies here: <a href="https://github.com/parthg/xapian">https://github.com/parthg/xapian</a> <br><br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">Most of the conflicts are resolved except "MSet" related parts in enquire.h <br><br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">You can play with it if you get time, it would definitely give you more insight into the current code-base.<br><br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">Cheers<br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)">Parth<br></div><div class="gmail_default" style="font-family:monospace,monospace;font-size:small;color:rgb(11,83,148)"><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 20, 2016 at 7:32 PM, James Aylett <span dir="ltr"><<a href="mailto:james-xapian@tartarus.org" target="_blank">james-xapian@tartarus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sun, Mar 20, 2016 at 05:31:37PM +0530, Ayush Tomar wrote:<br>
<br>
> I'm Ayush from New Delhi, India. I am interested in Letor Stabilisation<br>
> project for GSoC. I have a good background in machine learning. Sorry for<br>
> getting in so late, university exams were holding me back. I'll try to<br>
> cover as much as I can in the coming week.<br>
<br>
</span>Hi, Ayush. Welcome to Xapian!<br>
<span class=""><br>
> 1. Modifying xapian-letor/bin/questletor.cc to use and test core features<br>
> and API of letor. The current version of questletor.cc has a lot of<br>
> unusable and broken functions and is custom made for training with INEX<br>
> 2010 dataset. The intention is to make it usable for a user provided<br>
> database. Currently I am using xapian-docsprint/data/100-objects-v1.csv as<br>
> my database and some manually written queries and qrels to make things<br>
> work.<br>
<br>
</span>That's helpful; I haven't looked at questletor in a while. I'm not<br>
surprised the master version doesn't work, because (as noted in the<br>
project) there's code that we couldn't merge for licensing reasons.<br>
<br>
Note that where the project talks about tests, we mean automated<br>
tests, probably unit tests. It's worth looking at how xapian-core does<br>
these, because we'd expect a similar approach for xapian-letor. (I<br>
think you're already clear on that, but I wanted to make sure!)<br>
<span class=""><br>
> 2. Going through v-hasu's GSoC 2014 code to understand extra<br>
> functionalities added by him and planning how to introduce code from his<br>
> branch.<br>
<br>
</span>Good.<br>
<span class=""><br>
> 1. Creating a code example that lets the user use 100-objects-v1.csv as the<br>
> database and use Letor features and API to make queries over it.<br>
> Documenting how to make this example run.<br>
<br>
</span>Note again that master probably won't be sufficient to do this. The<br>
missing functionality (ie the unmerged work) was rewritten on v-hasu's<br>
(Hanxiao Sun) branch, so can be pulled from there to form the base.<br>
<span class=""><br>
> 3. Writing API and unit tests<br>
<br>
</span>Note as the project description states that these should be done<br>
alongside integrating work, rather than considered separately.<br>
<span class=""><br>
> I have some question:<br>
><br>
> 1. Is the procedure I mentioned above the right way to go about it? What<br>
> are the essential portions (in terms of code) that I should complete before<br>
> submitting the proposal?<br>
<br>
</span>It's not essential to complete any code ahead of the proposal, and as<br>
you have only a week now to do the proposal that needs to be your<br>
focus. Working with the code, however, is important to understand what<br>
work needs to done (and so will inform your proposal). So it's not<br>
necessary to be able to submit pull requests yet, but the work you've<br>
been doing in getting familiar with what code is there will form the<br>
basis of your proposal.<br>
<span class=""><br>
> 2. How can I create the test harness for xapian-letor similar to<br>
> xapian-core and start writing tests? Tests seem somewhat overwhelming to me<br>
> at the moment, it would be helpful if I could get some assistance on how to<br>
> go about it.<br>
<br>
</span>You'll need to copy the test harness. What I'd do is to copy the whole<br>
of the xapian-core/tests directory, then cut out all the actual<br>
tests. What's left should be the harness and supporting code. (You'll<br>
need to write some more support to<br>
<span class=""><br>
> 3. How important is writing new features for this project (for instance<br>
> implementing LambdaMART ranking)? Should I focus on them as well in my<br>
> proposal?<br>
<br>
</span>Not at all. There's more than enough work in stabilising and<br>
integrating previous work, writing tests and documentation, and<br>
creating a fully-working system suitable for general use. If you were<br>
to integrate all of v-hasu's branch and get that merged, then there's<br>
VcamX's (Jiarong Wei) work to look at from 2014, although that would<br>
require some more planning at the time (I wouldn't plan for that in<br>
your proposal).<br>
<span class="HOEnZb"><font color="#888888"><br>
J<br>
<br>
--<br>
James Aylett, occasional trouble-maker<br>
<a href="http://xapian.org" rel="noreferrer" target="_blank">xapian.org</a><br>
<br>
</font></span></blockquote></div><br></div>