[Xapian-devel] The workflow of letor module
Hanxiao Sun
sunhanxiaoisme at gmail.com
Thu Jun 5 08:56:50 BST 2014
Hi,
I am Hanxiao Sun, the student working on the letor module with Jiarong in
this year's gsoc.
Now, I have some idea about xapian-letor wanted to be discuss with
the community.
In the present code, we prepare a training file before we use the letor
module. When we have a query, we train the SVM module using the default
parameters. And then we use the Mset return by the query as the test set ——
using the trained SVM module to predict the ranking of the Mset. To be
clearly, we have no label(ground truth) in the test set. So, we
couldn't(maybe also no need) evaluate the ranking result in the current
workflow.
For normal user, the current workflow is OK(although the problem that how
to obtain the training file from these user has still not been solved).
They don't care the special parameters of each ranking module and they just
want a best possible ranking result.
But for other user, like the user who want to tune the parameters and add
feature into the ranking module, they also want to evaluate the ranking
result in the test set. In other word, they will have the ground truth in
their test set and need to use the metric module in the test process.
The difference between these two part of users is that we will call the
metric module to evaluate the ranking result if the test set has ground
truth, otherwise don't. This involves a issue that whether we needed a
independent script to call the metric module outside the "questletor"? If
we don't peel the evaluate process from "questletor", we need the user to
choose the mode they use "questletor". Has ground truth or not. But if we
peel the process from "questletor", the user will have little trouble when
they want to do a k-fold cross validation. They need split the data
by themselves and run the "questletor" and evaluate script K times.
Not sure if I am understanding this right and this seems to be the
issue more relevant to Jiarong's part. However, I still want to make it
clear. Any comments and suggestions will be appreciative.
Thanks!
--
孙晗晓(Hanxiao Sun)
Master Student of Computer Science at Institute of Computing
Technology,Chinese Academy of Sciences(ICT)
Email:sunhanxiaoisme at gmail.com <Email%3Asunhanxiaoisme at gmail.com>
Mobile: (86)186-0025-6936
------------------------------
This email (including any attachments) is confidential and may be legally
privileged. If you received this email in error, please delete it
immediately and do not copy it or use it for any purpose or disclose its
contents to any other person. Thank you.
本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140605/8c63ecfd/attachment.html>
More information about the Xapian-devel
mailing list