GSoC Project Guidance

James Aylett james at tartarus.org
Sun Feb 18 11:33:44 GMT 2018


On 15 Feb 2018, at 23:34, Neil Paul <androneil54 at gmail.com> wrote:

>          I am Indraneil Paul, a final year Computer Science student at IIIT Hyderabad. I wish to work on the project Learning to Rank: Clickstream Data Mining. I have an avid interest in ML and IR​, having done an ML related GSoC project last year as well under Green Navigation, and would like to go ahead with working on this project.
>          
>          In order to get a basic understanding of clickstream mining I have studied a few links:
> 1. An introductory blog post (https://www.blendo.co/blog/clickstream-data-mining-techniques-introduction/)
> 2. A more thorough review (https://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf)
>      
>          I would like to start corresponding with potential mentors regarding a viable set of timed realistic goals w.r.t. implementing an end-to-end learning to rank model and/or implementing the EM based training method for the DBN model. It would be great if I could get some pointers regarding how to go about this next.

Hi, Indraneil. If you haven't already, you should first go through our guide for potential students (https://trac.xapian.org/wiki/GSoC%20Guide). You need to get an idea of how the core of Xapian works, which this will do.

Then you should look in particular at the work done last year on clickstream-based training for letor. There are a number of specific suggestions in the project entry in the ideas list (https://trac.xapian.org/wiki/GSoCProjectIdeas#Project:LearningtoRankClickstreamDataMining), and what you'll want to do is to flesh out _how_ you'd tackle each of the aspects of it.

For instance, what are the key elements of end-to-end use of letor in omega? think in terms of a user of omega and what they'd need in order to be able to use this functionality. You may find Amazon's product definition process helpful here (https://www.allthingsdistributed.com/2006/11/working_backwards.html).

Come up with a sensible order to put the different aspects in, and estimate how long each might take. A number of previous projects (which you can find on our wiki: https://trac.xapian.org/wiki/GSoC) link to their project proposal, so I'd recommend you read a few of those through once you've looked at our guidance. (The guidance contains suggestions of the sort of thing we care about in a proposal.)

J

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/




More information about the Xapian-devel mailing list