Regarding Letor Click Data Mining Project

Olly Betts olly at survex.com
Fri Mar 17 05:58:38 GMT 2017


On Thu, Mar 02, 2017 at 05:19:56PM +0530, Gautam Dudeja wrote:
> I am interested in the Project: Learning to Rank Click Data Mining. From
> the project details given, I think the aim of the project is to make the
> Letor module more usable by providing the training data to it from the real
> time search results.
> The training data is to be generated from the click data which is basically
> the query-document pair.

That's some key information, but it's possible other data might be useful
- a few examples:

  * the rank of the clicked result in the result set
  * how long the user took to choose that result
  * if they came back and clicked on a different result

> I have gone through the format of Training data as provided in
> xapian-letor/docs/letor.rst.
> I want to know, Are we saving the click data/ search log somewhere?
> Please provide me some advice about my understanding of this project.

There isn't such a log currently.

What we're expecting for this project is that the applicant would look through
the academic literature and pick a promising looking approach that's been
shown to work already (the GSoC timescale is really too short to develop a
fresh approach).

But what we need to log depends on what data the chosen approach needs.

Once that's known. we can define a sensible log format (or look to see if
there's an existing log format that would work).

I'd suggest putting most of the effort into the actual mining of the data,
but if the log format used isn't something already produced then you
probably need to allow time to prototype something that produces it so
the system can actually be tried out end-to-end.  For example, maybe
configure the omega CGI search front-end to produce such a log.

Cheers,
    Olly



More information about the Xapian-devel mailing list