<div dir="ltr">Thank you Parth. It is really helpful for me to understand the project. </div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Mar 4, 2014 at 1:59 PM, Parth Gupta <span dir="ltr"><<a href="mailto:pargup8@gmail.com" target="_blank">pargup8@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Jia,<br><br><div class="gmail_extra"><div class="gmail_quote"><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr"><div><div>
I have several questions regarding the letor module,<span style="font-family:arial,sans-serif;font-size:17.3333px">I looked at the framework of learning to rank in xapian </span><a href="http://rishabhmehrotra.com/gsoc/17.png" style="font-family:arial,sans-serif;font-size:17.3333px" target="_blank">http://rishabhmehrotra.com/gsoc/17.png</a><span style="font-family:arial,sans-serif;font-size:17.3333px">, I am a little confused. Why using deep learning to find unsupervised features in test data? Since in my understanding, learning to rank model usually learn features from the training data then apply the model to the test data? Why test set and training set have different features? And deep learning is to extract hidden features from the data set, I don't think it is necessary to use it in this problem. Furthermore, I didn't see any implementation in the source code for deep learning, is it actually included in letor? </span></div>
</div></div></blockquote><div><br></div></div><div>The idea of the GSoC project proposed by Rishabh was based on extracting unsupervised features using deep learning on top of existing features based on term frequency and related statistics. Well, this is not a tested hypothesis that it would help but it was an added part. Lately we dropped idea of adding this deep learning module. So you dont see any code related to it.<br>
</div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
<div style="font-family:arial,sans-serif;font-size:17.3333px"><br></div><div style="font-family:arial,sans-serif;font-size:17.3333px"> For the source code <a href="https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor" target="_blank">https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor</a>, the last update is about 2 years ago, is that the latest version of the code? For several files such as ranker.cc, evalmetric.cc, there is no implementations of functions, I don't know if they have been implemented somewhere in the module(as far as I read through the source code, I didn't see any). </div>
</div></div></blockquote><div><br></div></div><div>That is the latest version of the code and the starting point of this year's GSoC project. The ranker.cc is an abstract class and inherited by the implemented rankers such as SVM, ListMLE and ListNET you can see the corresponding definition can be found in .cc files. The evaluation part is yet to be completed as per the instructions given in evalmetric.h<br>
<br></div><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
<div style="font-family:arial,sans-serif;font-size:17.3333px"> For the tests, are there any benchmark tests on SVM based or listnet models on sample datasets and what the NDCG or MAP scores of them ( I didn't see any measure methods have been implemented in the current module)? And how about the cross validation for the training set? Is there any method included in the current project? </div>
</div></div></blockquote><div><br></div></div><div>For the SVM based model, there exist the benchmarking available at <a href="http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme" target="_blank">http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme</a><br>
<br></div><div>Actually the first step of the new project will be generate this figure for SVM based model with the new refactored code which is mostly done during GSoC 2012 but never tested. We would appreciate if the prospective students of the Letor project can generate this value before the student selection deadline.<br>
</div><div class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
<div style="font-family:arial,sans-serif;font-size:17.3333px"><br></div><div style="font-family:arial,sans-serif;font-size:17.3333px">For SVM method, I found letor_learn_model() has been commented out, but I didn't find any other file contain this function (or maybe in letor_internal.cc)? </div>
<div style="font-family:arial,sans-serif;font-size:17.3333px"><br></div><div style="font-family:arial,sans-serif;font-size:17.3333px">Finally I found a file called letor_internal_refactored.cc file, is that the latest version of letor_internal.cc ? Is letor_internal.cc </div>
<div style="font-family:arial,sans-serif;font-size:17.3333px">still being used?</div></div></div></blockquote><div><br></div></div><div>Right. The svmranker.cc is to be defined. Right now the SVM based ranker is available in only non-refactored format which lies in letor_internal_refactored.cc <br>
<br></div><div>I think it is the best exercise to prepare the svmranker.cc from the letor_internal_refactored.cc by implemening necessary methods and generating the MAP score reported on INEX data that would give you a better grip of the code. I would love to see a patch on it.<br>
<br></div><div>Cheers,<br></div><div>Parth.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div dir="ltr"><div>
<div style="font-family:arial,sans-serif;font-size:17.3333px">
<br></div><div><span style="font-family:arial,sans-serif;font-size:17.3333px">Thank you very much. I am waiting for your reply. </span> <span><font color="#888888"><br clear="all">
<div><br></div>-- <br>Jia Xu<br><br>
</font></span></div></div></div>
<br></div>_______________________________________________<br>
Xapian-devel mailing list<br>
<a href="mailto:Xapian-devel@lists.xapian.org" target="_blank">Xapian-devel@lists.xapian.org</a><br>
<a href="http://lists.xapian.org/mailman/listinfo/xapian-devel" target="_blank">http://lists.xapian.org/mailman/listinfo/xapian-devel</a><br>
<br></blockquote></div><br></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Jia Xu<br><br>
</div>