Logging the click data

James Aylett james at tartarus.org
Wed Jun 7 23:17:40 BST 2017

On 6 Jun 2017, at 23:12, Vivek Pal <vivekpal.dtu at gmail.com> wrote:

> > There's a lot of flexibility already, because the log format is just
> > omegascript. So I don't think you need to implement a new command to
> > achieve this. (Although you might need a command to generate the query
> > id. It depends on how you're going to do that.)
> Ok, I'll try adapting the existing log command to achieve the kind of logging
> we want.

In case I wasn't clear: I don't think you have to modify the command at all. Just create a template that uses the command as it currently works.

> And, about the command to generate unique query ids, I've been thinking
> to tackle this as a kind of hashing problem where we'll basically provide the
> query text as input to generate a unique id as output. Although, coming
> up with a 100% collision-free hashing algorithm for this task is something
> worth considering first.

Don't worry about collisions; it isn't a catastrophe if this collides sometimes (especially as you can detect when that happens), so any algorithm that's fairly fast should be fine. (MD5 would give ~22 base64 characters, which sounds fine to me; we already have an implementation in the omega source code, so I'd probably use that.)

From the models you talked about, I assume you'll need to hash more than just the query text — I'm guessing something like the timestamp then pass it between different invocations of the CGI (both for click throughs and for navigating around the query pages).

> Other caveats include max length of the generated
> unique id string and whether we should truncate leading whitespaces from
> the query text to avoid "essentially same" queries from being recorded in
> different entries in the log file. What do you suggest?

Stripping whitespace at either end of the query string seems reasonable.


 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/

More information about the Xapian-devel mailing list