[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

Olly Betts olly at survex.com
Thu Mar 21 22:37:24 GMT 2013


On Thu, Mar 21, 2013 at 06:58:41PM +0530, Mudit Gupta wrote:
> I am interested in "Learning To Rank" project.  If I am not wrong, I found
> the framework incorporated by Parth in the cloned code. It needed some
> refactoring in order to incorporate more algorithms and was done by Rishabh
> and available in his git repo (https://github.com/rishabhmehrotra/xapian)
> but is still not merged. So, I assume I should think of additions to the
> code in Rishbh's repo.

Yes, I think that's the best starting point.

> Moreover, I noticed that SVM-rank, ListMLE and
> ListNet is already present in the code. I am interested in addition of a
> random forest approach and looking for appropriate libraries. I would be
> great to get input by the Xapian community in terms of preference of
> algorithms and open source libraries. It would also be great to know the
> priority of the Letor project to the Xapian community.

Parth and I talked this over recently, and we concluded that this year a
major focus should be on consolidating the existing work.  That doesn't
necessarily mean that new features can't be looked at, but one of the
deliverables should really be a xapian-letor module which we're happy to
tag as a stable release.  A project which adds more algorithms is
interesting, but if the end result isn't useful to Xapian users, there's
much less benefit to be had from it.

One of the major things missing is a testsuite.  Without any automated
tests, it's hard to have much confidence that the code works, and it
makes it much harder to make changes to the code in the future without
introducing new bugs.  So I think adding a testsuite is important.
The harness from xapian-core is suitable, but testcases need writing,
and the bugs that actually writing testcases will inevitably uncover
need fixing.

We should also look at what features are missing from xapian-core
which would be useful for xapian-letor, and consider implementing them -
especially if they have other potential uses.  Two that I'm aware of
are:

* Fundamentally, xapian-letor wants to take a Xapian::MSet object and
  reorder it, so an API which allows that would be handy - then the
  output of xapian-letor can be an Xapian::MSet object, allowing it to
  be cleanly slotted into existing applications using the Xapian API.
  An MSet reordering API also has other potential uses - for example,
  clustering results.

* Field-related features currently have to be calculated specially by
  xapian-letor, but these would also be useful to have for other uses
  (e.g. implementing BM25f) so tracking them in the database backend
  in xapian-core is worth investigating.

I'll update the entry on the project ideas page with the above shortly.

Cheers,
    Olly



More information about the Xapian-devel mailing list