GSoC 2017: Letor Click Data Mining

James Aylett james at tartarus.org
Tue Mar 21 20:42:38 GMT 2017


On 21 Mar 2017, at 09:17, Vivek Pal <vivekpal.dtu at gmail.com> wrote:

> > There's a $log{} command available in Omega templates. We can't log from
> > the result page template, as the clicks happen after that is used, but we
> > could make result links redirect via a second Omega template which does
> > the logging.
> 
> To make sure I understand correctly -- we need a second Omega template to enable
> redirection of each result link from opensearch template to that template after they are
> clicked, our first step should be towards implementing such a template or it exists already?
> Please correct me if I'm wrong.

Isn't this from the query template, ie from the main web page of search results? (It might make sense from opensearch as well, though.) We need some way of logging when people click on a search result — which you can build using a second omegascript template, as Olly suggested.

> I've been exploring the Omega codebase for the past few days but it would be great if
> you could elaborate a bit about how logging works internally. So far, I understand that
> $log command recursively calls the eval function in which it's defined and prints the
> returned string from eval to a log file.

You're overthinking things: look to the documentation first:

> $log{LOGFILE[,ENTRY]}
> 
> write to the log file LOGFILE in directory log_dir (set in omega.conf). ENTRY is the OmegaScript for the log entry, and a linefeed is appended. If LOGFILE cannot be opened for writing, nothing is done (and ENTRY isn't evaluated). ENTRY defaults to a format similar to the Common Log Format used by webservers.

So the only thing you really need to know is the ENTRY format, so you can figure out how to log what you need. (Which you should identify before diving into code.)

> I've read the paper to understand the FairPairs algorithm. To implement it, we'd need to
> take the result links for each query from opensearch template and feed it into the algorithm
> which will rearrange the results using a uniform probability variable. Modified results can
> then presented using opensearch template and clicks are recorded to adjust the relevance
> score of certain doc links.

You need to think more carefully about the layers involved here. We don't want to post-process the output of a template: we want to be able to render the template with the results rearranged.

Incidentally, this feels to me like it needed an MSet re-ordering system. So it may be worth looking at the discussion around doing this for Letor, which has a similar problem. This was the mailing list discussion initiated by Ayush (based on some previous IRC conversations, IIRC) as part of his Letor project last year: https://lists.xapian.org/pipermail/xapian-devel/2016-July/002981.html

> Also, I'm following the Omega example wiki page to setup Omega at present to have a first
> hand experience with it. I've xapian-core and Omega installed on my system (had it installed
> earlier but pulled the recent changes and installed again).

That page is ancient, so I hope you're actually installing the 1.4 series Xapian and Omega! This is the problem with overly-specific walkthroughs :-(

> But visiting localhost/cgi-bin/omega.cgi gives 500 Internal Sever Error. Looking into error log
> file (https://paste.debian.net/922929/) reveals that it doesn't have permission to create .libs
> directory

That looks to me like you haven't installed omega, but are trying to run with the development version (which is a libtool script that finds the right pieces and puts them together). That seems to be what the walkthrough tells you to do, which is unhelpful.

>  in /usr/lib/cgi-bin which shouldn't be the case as I set the permissions correctly using:
> 
> sudo chmod 755 /usr/lib/cgi-bin/omega.cgi

That is correct, but won't solve your problem. When you ran `make install` for omega, it will have copied the CGI somewhere, although I can't remember where; I'd guess /usr/local/lib/bin/xapian-omega by default, from eyeballing the Makefile.

> Also, I didn't see the below output while indexing the data using omnidex.
> 
> [Entering directory /]
> Indexing "/ci_01.htm" as text/html ...  added.
> Indexing "/ci_02.htm" as text/html ...  added.
> ...

What did you get? Nothing? If nothing, you may not have followed the instructions on unpacking the sample (book) data properly.

More generally, I'd recommend reading the omega documentation (particularly around omindex) to understand what it does, rather than just following the walkthrough.

J

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/




More information about the Xapian-devel mailing list