GSoC 2017: Letor Click Data Mining

Vivek Pal vivekpal.dtu at gmail.com
Thu Mar 23 06:18:26 GMT 2017


> You could do that by identifying the search session instead of the user,
> which makes it closer to what we need than to something that might trip you
> into privacy concerns.

Okay, that would be much better. :)

> Third records some information about what sort of query it is — add,
> morelike or a plain query. Last provides the estimated match size and then
> the HTTP referrer if one were set. Neither is particularly interesting in
> this case.

Thanks for the explanation. So, as I understand it, we'll need some more info
to be logged than this to be able to train click models for relevance judgeme-
-nts.

> and you'll need a way to use letor from omega, or you'll have trained a
> model for no good reason :)

Sorry, I may have misunderstood you here but why would we need a way to use
letor from omega? For training Letor module, wouldn't we just need two files
i.e. Query and Qrel as mentioned in the xapian-letor docs? Letor API can then
generate the final training file using those two files.

And to mine the relevance judgements for Qrel file from logs, we'll need to
train one of the click models such as DBM etc..

Is there a better way to mine the relevance judgements than click models?

> Yes. But if you follow the walkthrough, it copies the uninstalled version
> of the omega CGI. omega is the CGI (I think).

Oh, I thought it'd be a .cgi file. Okay, so I just need to copy this omega
from /usr/local/lib/xapian-omega/bin to usr/lib/cgi-bin and work with it.

Thanks,
Vivek



More information about the Xapian-devel mailing list