GSoC 2016 Letor Stabilisation

Ayush Tomar ayushtomar at gmail.com
Sun Mar 20 12:01:37 GMT 2016


Hello,

I'm Ayush from New Delhi, India. I am interested in Letor Stabilisation
project for GSoC. I have a good background in machine learning. Sorry for
getting in so late, university exams were holding me back. I'll try to
cover as much as I can in the coming week.

I am following the plan of attack suggested on the project page. Following
are the things that I have completed:

1. Getting current master branch building cleanly.
2. Going through all resources and papers mentioned on the project page.
3. Generating lcov test coverage reports.
4. Going through code in current master of xapian-letor and understanding
all functionalities.


Following are the things on which I am currently working on:

1. Modifying xapian-letor/bin/questletor.cc to use and test core features
and API of letor. The current version of questletor.cc has a lot of
unusable and broken functions and is custom made for training with INEX
2010 dataset. The intention is to make it usable for a user provided
database. Currently I am using xapian-docsprint/data/100-objects-v1.csv as
my database and some manually written queries and qrels to make things
work.
2. Going through v-hasu's GSoC 2014 code to understand extra
functionalities added by him and planning how to introduce code from his
branch.

In summary, the approach I will follow is going to be:

1. Creating a code example that lets the user use 100-objects-v1.csv as the
database and use Letor features and API to make queries over it.
Documenting how to make this example run.
2. Introducing features from 2014 projects and add to the above example.
Document them.
3. Writing API and unit tests

I have some question:

1. Is the procedure I mentioned above the right way to go about it? What
are the essential portions (in terms of code) that I should complete before
submitting the proposal?
2. How can I create the test harness for xapian-letor similar to
xapian-core and start writing tests? Tests seem somewhat overwhelming to me
at the moment, it would be helpful if I could get some assistance on how to
go about it.
3. How important is writing new features for this project (for instance
implementing LambdaMART ranking)? Should I focus on them as well in my
proposal?

Thanks,
Ayush
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160320/cdd6fcff/attachment.html>


More information about the Xapian-devel mailing list