GSoC

Sat Apr 6 14:58:41 BST 2019

Hi Александр, and welcome to Xapian!

I'm not quite sure what question you're asking, but hopefully the following is helpful.

First, you should look at the wiki pages of the previous project that started this work: <https://trac.xapian.org/wiki/GSoC2017/LetorClickstream>. That discusses what was planned, what happened (there's a week-by-week journal, which we ask all our students to do during GSoC), and a write-up (the "work product") which is what was submitted to Google at the end — and says what was completed, what was left unmerged (although the PR mentioned has since been completed and merged), and some ideas for future work, which should be read alongside the project description on our main page (there's some overlap). This will answer some of your questions, and most immediately:

1. You ask about the advanced training method. This is in the paper Vivek based his work on. The paper is linked from the project description, and the ACM page from Vivek's project plan.

2. Other training methods are mentioned by name, and you should be able to find them in an academic literature search. If you don't have access to an academic library, then let us know, but if you do then we expect you to be able to hunt down references (and indeed to find new options through a search — IR is a field with ongoing work both within academia and industry, and new ideas and research appears every year).

For the timeline itself, this is something we expect you to do the first draft of. We can provide feedback, but you need to demonstrate that you can understand the problem and possible solutions enough to propose a timeline of work across the summer. This is a key skill in breaking down and planning larger software projects, as well as in working autonomously in the way that Open Source projects typically happen. 

You asked in another email about examples of Letor and Omega use. Right now, because this project is not complete, there are no examples of their use together. It's worth reading through the documentation for Omega (including articles linked on the wiki) to get a feel for how people use it. Then the "end to end" part of this project would be to think about how Letor should be incorporated into that — the steps people will have to take, and what software and documentation is missing for people to be able to do so. The documentation in Vivek's work product should provide a good starting point for that — it gets you from a running Omega with letor to a trained letor model that can be used to rerank queries — but that reranking is not currently available in Omega.

Finally, you asked about qualifying tasks — these don't have to be completed by April 9th, since the assessment period for applications continues until May. A couple of things come to mind:

 * pick any small tasks or bug as suggested in our guidance notes (this would be preferred)
 * open a WIP ("work in progress" — ie not to be merged) pull request that changes omega to rerank based on a trained letor model. This would just be a quick approach rather than satisfying the project (for instance it wouldn't include configuration — it would always try to rerank), but would enable you to get familiar with Omega and with Letor.

J

> On 6 Apr 2019, at 08:59, Александр Слесарев <alexander.g.slesarev at gmail.com> wrote:
> 
> Hi! Can you give some additional info about  "Learning to Rank Clickstream Data Mining/Currently, DBN click model training is based on a simple counting algorithm. There's an advanced version of training method given by a combination of EM and forward-backward algorithm in the paper which is worth having." to help me make a time schedule?

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/