Logging the click data

Mon Jun 5 05:17:17 BST 2017

Hi James,

> ID: some identifier for each query
> QUERY: text of the query (when the query is run)
> URLs: every URL displayed (or alternatively, the Xapian docid — this
> might be easier)
> OFFSET: otherwise you'll have difficulty coping with result pages other
> than the first page (when this happens, the query ID should probably
> remain the same, and when you aggregate you can "glue" the different
> pages together)

I'm not clear on what the OFFSET really represents. Could you please
explain a bit? And, I think we certainly need the CLICKS field as
otherwise we can't capture the click information which is essential
to training the click model. This field will need to be of same size
and structure as URLs field (i.e. a list) e.g. [0,1,2,0,0] for 5 urls
in the result page.

> One would then be the clicks, so for each URL clicked in a result page,
> emit:
>
> ID: the query identifier that matches the entry in the search log
> URL: the URL redirected to (again, or the Xapian docid)
>
> This means you need to be able to generate ID for each query, and
> also that each clickable URL in the results page will need to go via the
> omega CGI using a different template whose job it is to log ID & URL
> to the click log and then redirect to URL. Once generated, the ID can
> be passed through from call to call (including on pagination)

So, whenever a click occurs on the result page, we log the query
ID and the clicked url via a different template which will be triggered
with each click event but I'm not sure how we will be to capture the
click information if we don't record the number of times each url was
clicked in a separate CLICKS field? Also, just to be sure, we will log
such pairs of query ID and URL in separate files to be aggregated
later into a single file?

In the end, we will have two files it seems -- one created from the
query template containing separate entries for each executed search
as per the format you described previously and another containing
query IDs and click URLs logged using a different template?

I also wanted to ask how does the log command ($log{query.log}) in
the query template work. It doesn't seem to comply with the format
mentioned in its documentation as it expects two arguments but we
provide only one here i.e. query.log and what does this argument
mean?

Thanks,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170605/645009fd/attachment.html>