Logging the click data
James Aylett
james at tartarus.org
Wed Jun 7 23:17:40 BST 2017
On 6 Jun 2017, at 23:12, Vivek Pal <vivekpal.dtu at gmail.com> wrote:
> > There's a lot of flexibility already, because the log format is just
> > omegascript. So I don't think you need to implement a new command to
> > achieve this. (Although you might need a command to generate the query
> > id. It depends on how you're going to do that.)
>
> Ok, I'll try adapting the existing log command to achieve the kind of logging
> we want.
In case I wasn't clear: I don't think you have to modify the command at all. Just create a template that uses the command as it currently works.
> And, about the command to generate unique query ids, I've been thinking
> to tackle this as a kind of hashing problem where we'll basically provide the
> query text as input to generate a unique id as output. Although, coming
> up with a 100% collision-free hashing algorithm for this task is something
> worth considering first.
Don't worry about collisions; it isn't a catastrophe if this collides sometimes (especially as you can detect when that happens), so any algorithm that's fairly fast should be fine. (MD5 would give ~22 base64 characters, which sounds fine to me; we already have an implementation in the omega source code, so I'd probably use that.)
From the models you talked about, I assume you'll need to hash more than just the query text — I'm guessing something like the timestamp then pass it between different invocations of the CGI (both for click throughs and for navigating around the query pages).
> Other caveats include max length of the generated
> unique id string and whether we should truncate leading whitespaces from
> the query text to avoid "essentially same" queries from being recorded in
> different entries in the log file. What do you suggest?
Stripping whitespace at either end of the query string seems reasonable.
J
--
James Aylett
devfort.com — spacelog.org — tartarus.org/james/
More information about the Xapian-devel
mailing list