[Xapian-devel] Questions on letor module

Tue Mar 4 21:59:05 GMT 2014

Hi Jia,

  I have several questions regarding the letor module,I looked at the
> framework of learning to rank in xapian
> http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why using
> deep learning to find unsupervised features in test data? Since in my
> understanding, learning to rank model usually learn features from the
> training data then apply the model to the test data? Why test set and
> training set have different features? And deep learning is to extract
> hidden features from the data set, I don't think it is necessary to use it
> in this problem. Furthermore, I didn't see any implementation in the source
> code for deep learning, is it actually included in letor?
>

The idea of the GSoC project proposed by Rishabh was based on extracting
unsupervised features using deep learning on top of existing features based
on term frequency and related statistics. Well, this is not a tested
hypothesis that it would help but it was an added part. Lately we dropped
idea of adding this deep learning module. So you dont see any code related
to it.

>
>   For the source code
> https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor,
> the last update is about 2 years ago, is that the latest version of the
> code? For several files such as ranker.cc, evalmetric.cc, there is no
> implementations of functions, I don't know if they have been implemented
> somewhere in the module(as far as I read through the source code, I didn't
> see any).
>

That is the latest version of the code and the starting point of this
year's GSoC project. The ranker.cc is an abstract class and inherited by
the implemented rankers such as SVM, ListMLE and ListNET you can see the
corresponding definition can be found in .cc files. The evaluation part is
yet to be completed as per the instructions given in evalmetric.h

 For the tests,  are there any benchmark tests on SVM based or listnet
> models on sample datasets and what the NDCG or MAP scores of them ( I
> didn't see any measure methods have been implemented in the current
> module)? And how about the cross validation for the training set? Is there
> any method included in the current project?
>

For the SVM based model, there exist the benchmarking available at
http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme

Actually the first step of the new project will be generate this figure for
SVM based model with the new refactored code which is mostly done during
GSoC 2012 but never tested. We would appreciate if the prospective students
of the Letor project can generate this value before the student selection
deadline.

>
> For SVM method, I found letor_learn_model() has been commented out, but I
> didn't find any other file contain this function (or maybe in
> letor_internal.cc)?
>
> Finally I found a file called letor_internal_refactored.cc file, is that
> the latest version of letor_internal.cc ? Is letor_internal.cc
> still being used?
>

Right. The svmranker.cc is to be defined. Right now the SVM based ranker is
available in only non-refactored format which lies in
letor_internal_refactored.cc

I think it is the best exercise to prepare the svmranker.cc from the
letor_internal_refactored.cc by implemening necessary methods and
generating the MAP score reported on INEX data that would give you a better
grip of the code. I would love to see a patch on it.

Cheers,
Parth.

> Thank you very much. I am waiting for your reply.
>
> --
> Jia Xu
>
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/8786130d/attachment-0001.html>