GSoC '17: Reintroducing myself
Olly Betts
olly at survex.com
Sun Mar 19 05:12:32 GMT 2017
Generally this all sounds sensible to me. A few comments below:
On Sat, Mar 11, 2017 at 08:22:35PM +0530, Vivek Pal wrote:
> 1. Raw click data can be obtained from Omega logs. If there's currently no
> functionality for that then the very first step will be to implement a
> logging facility in Omega or may be even a standalone proxy-log server to
> record the click data.[1] We'd need different functionalities in that
> logging facility to extract the following type of information depending
> upon the mining technique we choose to employ:
There's a $log{} command available in Omega templates. We can't log from
the result page template, as the clicks happen after that is used, but we
could make result links redirect via a second Omega template which does
the logging.
> But position bias may strongly affect the accuracy of pairwise preference
> learning so we need a position bias free method for the learning task.
> Radlinski and Joachims [3] gave "a simple method to modify the presentation
> of search results that provably gives relevance judgements that are
> unaffected by presentation bias" called simple FairPairs algorithm. The
> modified search result is presented to the user and click data is extracted
> and mined thereafter.
So for that you'd also need to implement this result modification and then
to use that new feature from Omega.
> And, there are also several sequential click models that use the hypothesis
> that there is no position bias but that doesn't sound like a good solution
> so I think it's best to focus on preference pair learning models.
That indeed seems an unrealistic assumption, though I guess what really
matters is how effective these models are in practice (after all, models
are almost inherently simplifications of reality).
> The most fundamental question still remains unanswered for me after going
> through all these papers that how the final binary relevance judgements are
> assigned to the docs in the search results. I think once we have the
> relevance judgements for Qrel file, we are pretty much done as rest of the
> things starting from generating a training file is handled by letor itself.
Yes, that seems the appropriate boundary with the existing xapian-letor
module.
Cheers,
Olly
More information about the Xapian-devel
mailing list