Logging the click data

Mon Jun 5 07:41:45 BST 2017

On 5 Jun 2017, at 05:17, Vivek Pal <vivekpal.dtu at gmail.com> wrote:

> > ID: some identifier for each query
> > QUERY: text of the query (when the query is run)
> > URLs: every URL displayed (or alternatively, the Xapian docid — this
> > might be easier)
> > OFFSET: otherwise you'll have difficulty coping with result pages other
> > than the first page (when this happens, the query ID should probably
> > remain the same, and when you aggregate you can "glue" the different
> > pages together)
> 
> I'm not clear on what the OFFSET really represents. Could you please
> explain a bit?

Omega paginates results (as does Xapian's MSet, internally). So if you're displaying the second page of results, you'll need to know that when building training data. It's affected by TOPDOC and also by the <>[# CGI variables, but internally to omega there's one variable it's mapped onto.

In omegascript, you can find this using $topdoc.

> And, I think we certainly need the CLICKS field as
> otherwise we can't capture the click information which is essential
> to training the click model. This field will need to be of same size
> and structure as URLs field (i.e. a list) e.g. [0,1,2,0,0] for 5 urls
> in the result page.

You will need to generate a file in the format you proposed from the two logging files.

> So, whenever a click occurs on the result page, we log the query
> ID and the clicked url via a different template which will be triggered
> with each click event

Yes.

> but I'm not sure how we will be to capture the
> click information if we don't record the number of times each url was
> clicked in a separate CLICKS field?

If you have a log line for each time a particular result was clicked, then you can generate CLICKS by adding them up.

> Also, just to be sure, we will log
> such pairs of query ID and URL in separate files to be aggregated
> later into a single file?

Well…that's kind of a deployment question. I suggest that the ID,URL (or QUERYID,DOCID) lines are logged to a file separate to the one used to log the query details, because it's easier to think about, and the code is slightly more straightforward. However in the general case, if you have multiple webservers for your site, then each is likely to log to its own file, and you'll later on have to add them all together.

> In the end, we will have two files it seems -- one created from the
> query template containing separate entries for each executed search
> as per the format you described previously and another containing
> query IDs and click URLs logged using a different template?

Yes, that's right. I recommend logging Xapian docids instead of click URLs for the reason previously discussed.

> I also wanted to ask how does the log command ($log{query.log}) in
> the query template work.

It's documented (tersely) in the omegascript documentation. The format is:

$log{LOGFILE[,ENTRY]}

> It doesn't seem to comply with the format
> mentioned in its documentation as it expects two arguments but we
> provide only one here i.e. query.log and what does this argument
> mean?

The [] means that the second parameter is optional. Indeed, the documentation says:

> ENTRY defaults to a format similar to the Common Log Format used by webservers.

If you do provide ENTRY, it's more omegascript which is evaluated to produce the string written to LOGFILE. (This is hinted at, but not made quite explicit.) See around line 140 of xapian-applications/omega/query.cc for how the default is implemented.

J

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/