<div dir="ltr">Hi Mayank,<br><br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div>

<br></div>Before getting started to work on svmranker.cc, I need to discuss a few things.<br></div></div></div></div></div></blockquote><div><br></div><div>Yes, it is a good idea to have insight of the framework before starting to actually write something. <br>

<br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div>For <b>featurevector.h </b>-<br><br></div><div style="margin-left:40px">


1. I think it is a header file for the data-structure used for storing a query relevance though it has been mentioned there that <a href="https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/featurevector.h#L1" target="_blank">This file responsible for transforming the document into the feature space</a> . Also all the methods there are <b>get</b> and <b>set</b> except <b>load_relevance</b>(). This same method is also present in <a href="https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/featuremanager.h#L55" target="_blank">featuremanager.h</a> . Implementation wise too they are same. I can't find the reason why the same method is present in two headers.<br>


<a href="http://trac.xapian.org/wiki/GSoC2012/LTR/TODO" target="_blank">http://trac.xapian.org/wiki/GSoC2012/LTR/TODO</a> also shows that there shouldn't be load_relevance() method in featurevector.h .<br></div></div>

</div></div></blockquote><div><br></div><div>Some redundancy might be observed as the code is not scrubbed and actually the project was unfortunately could not finish. Yes, load relevance lies more naturally in featuremanager than featurevector.<br>

<br></div><div>Bascially the featurevector operates at a document level and the ranklist operates at a query level. One query has many documents related to it. So all those values which are common for all the documents will be in ranklist and the information pertaining to the documents only will rest in featurevector. Featuremanger does most of the job to construct featurevector and fetch necessary statistics for it. <br>

<br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div style="margin-left:40px"><br>2. As it was mentioned in a mail by Jiarong Wei, the data member <i>label</i> should be of type <b>bool</b> rather than <b>double</b>. The data member <i>fcount</i> is also unused.<br>

</div></div></div></div></blockquote><div><br></div><div>I just answered him that, many Letor datasets have more than two relevance levels (Letor 3.0 and 4.0 have three relevance levels, Yahoo! Letor dataset has 5). The idea behind keeping it double is when we have real number relevance for the feature vector assigned by the ranking algorithm, it will be stored on the same place. The evaluationmetric should sort the document based on this number.<br>

<br></div><div>Yes, 'fcount' must be used and it is a TODO.<br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div>

<div style="margin-left:40px">


<br>3. As it is a feature vector then there should be data member <i>queryid</i> but I found out that it is in <a href="https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/ranklist.h#L50" target="_blank">ranklist.h</a> . <br>

</div></div></div></div></blockquote><div><br></div><div>Just see the explanation to point 1. <br> <br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div dir="ltr"><div><div><div style="margin-left:40px">


</div></div></div><div><br></div><div>Other than that I wanted to know that has ListMLE and ListNet been tested? And what is autoencoder.cc for and where is the "<span>dimred/ya_ate_dimred.h" header that has been included in it?<span class="HOEnZb"><font color="#888888"><br>

</font></span></span></div></div></blockquote><div><br></div><div>ListMLE and ListNet are not tested, also Rishabh did not mentioned their performance. We have only the benchmark evaluation of svmranker. Just ignore the autoencoder.cc because it was part of Rishabh's idea to add unsupervised features using Deep learning in feature vector in addition to conventional features. <br>

<br></div><div>Cheers,<br></div><div>Parth.<br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><span><span class="HOEnZb"><font color="#888888">


</font></span></span></div><span class="HOEnZb"><font color="#888888"><div><span><br></span></div><div><span>-Mayank<br></span></div></font></span></div>

<br>_______________________________________________<br>

Xapian-devel mailing list<br>

<a href="mailto:Xapian-devel@lists.xapian.org">Xapian-devel@lists.xapian.org</a><br>

<a href="http://lists.xapian.org/mailman/listinfo/xapian-devel" target="_blank">http://lists.xapian.org/mailman/listinfo/xapian-devel</a><br>

<br></blockquote></div><br></div></div>