<div dir="ltr"><div class="gmail_extra"><div class="gmail_extra">Hi James,</div><div class="gmail_extra"><br></div><div class="gmail_extra">> ID: some identifier for each query</div><div class="gmail_extra">> QUERY: text of the query (when the query is run)</div><div class="gmail_extra">> URLs: every URL displayed (or alternatively, the Xapian docid — this</div><div class="gmail_extra">> might be easier)</div><div class="gmail_extra">> OFFSET: otherwise you'll have difficulty coping with result pages other</div><div class="gmail_extra">> than the first page (when this happens, the query ID should probably</div><div class="gmail_extra">> remain the same, and when you aggregate you can "glue" the different</div><div class="gmail_extra">> pages together)</div><div class="gmail_extra"><br></div><div class="gmail_extra">I'm not clear on what the OFFSET really represents. Could you please</div><div class="gmail_extra">explain a bit? And, I think we certainly need the CLICKS field as</div><div class="gmail_extra">otherwise we can't capture the click information which is essential</div><div class="gmail_extra">to training the click model. This field will need to be of same size</div><div class="gmail_extra">and structure as URLs field (i.e. a list) e.g. [0,1,2,0,0] for 5 urls</div><div class="gmail_extra">in the result page.</div><div class="gmail_extra"><br></div><div class="gmail_extra">> One would then be the clicks, so for each URL clicked in a result page,</div><div class="gmail_extra">> emit:</div><div class="gmail_extra">></div><div class="gmail_extra">> ID: the query identifier that matches the entry in the search log</div><div class="gmail_extra">> URL: the URL redirected to (again, or the Xapian docid)</div><div class="gmail_extra">></div><div class="gmail_extra">> This means you need to be able to generate ID for each query, and</div><div class="gmail_extra">> also that each clickable URL in the results page will need to go via the</div><div class="gmail_extra">> omega CGI using a different template whose job it is to log ID & URL</div><div class="gmail_extra">> to the click log and then redirect to URL. Once generated, the ID can</div><div class="gmail_extra">> be passed through from call to call (including on pagination)</div><div class="gmail_extra"><br></div><div class="gmail_extra">So, whenever a click occurs on the result page, we log the query</div><div class="gmail_extra">ID and the clicked url via a different template which will be triggered</div><div class="gmail_extra">with each click event but I'm not sure how we will be to capture the</div><div class="gmail_extra">click information if we don't record the number of times each url was</div><div class="gmail_extra">clicked in a separate CLICKS field? Also, just to be sure, we will log</div><div class="gmail_extra">such pairs of query ID and URL in separate files to be aggregated</div><div class="gmail_extra">later into a single file?</div><div class="gmail_extra"><br></div><div class="gmail_extra">In the end, we will have two files it seems -- one created from the</div><div class="gmail_extra">query template containing separate entries for each executed search</div><div class="gmail_extra">as per the format you described previously and another containing</div><div class="gmail_extra">query IDs and click URLs logged using a different template?</div><div class="gmail_extra"><br></div><div class="gmail_extra">I also wanted to ask how does the log command ($log{query.log}) in</div><div class="gmail_extra">the query template work. It doesn't seem to comply with the format</div><div class="gmail_extra">mentioned in its documentation as it expects two arguments but we</div><div class="gmail_extra">provide only one here i.e. query.log and what does this argument</div><div class="gmail_extra">mean?</div><div class="gmail_extra"><br></div><div class="gmail_extra">Thanks,</div><div class="gmail_extra">Vivek</div></div></div>