[Xapian-devel] The workflow of letor module

Hanxiao Sun sunhanxiaoisme at gmail.com
Thu Jun 5 08:56:50 BST 2014


I am Hanxiao Sun, the student working on the letor module with Jiarong in
this year's gsoc.

Now, I have some idea about xapian-letor wanted to be discuss with
the community.

In the present code, we prepare a training file before we use the letor
module. When we have a query, we train the SVM module using the default
parameters. And then we use the Mset return by the query as the test set ——
using the trained SVM module to predict the ranking of the Mset. To be
clearly, we have no label(ground truth) in the test set. So, we
couldn't(maybe also no need) evaluate the ranking result in the current

For normal user, the current workflow is OK(although the problem that how
to obtain the training file from these user has still not been solved).
They don't care the special parameters of each ranking module and they just
want a best possible ranking result.

But for other user, like the user who want to tune the parameters and add
feature into the ranking module, they also want to evaluate the ranking
result in the test set. In other word, they will have the ground truth in
their test set and need to use the metric module in the test process.

The difference between these two part of users is that we will call the
metric module to evaluate the ranking result if the test set has ground
truth, otherwise don't. This involves a issue that whether we needed a
independent script to call the metric module outside the "questletor"? If
we don't peel the evaluate process from "questletor", we need the user to
choose the mode they use "questletor". Has ground truth or not. But if we
peel the process from "questletor", the user will have little trouble when
they want to do a k-fold cross validation. They need split the data
by themselves and run the "questletor" and evaluate script K times.

Not sure if I am understanding this right and this seems to be the
issue more relevant to Jiarong's part. However, I still want to make it
clear. Any comments and suggestions will be appreciative.

孙晗晓(Hanxiao Sun)
Master Student of Computer Science at Institute of Computing
Technology,Chinese Academy of Sciences(ICT)
Email:sunhanxiaoisme at gmail.com <Email%3Asunhanxiaoisme at gmail.com>
Mobile: (86)186-0025-6936

This email (including any attachments) is confidential and may be legally
privileged. If you received this email in error, please delete it
immediately and do not copy it or use it for any purpose or disclose its
contents to any other person. Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140605/8c63ecfd/attachment.html>

More information about the Xapian-devel mailing list