[Xapian-devel] GSoC 2015 - Weighting Schemes

James Aylett james-xapian at tartarus.org
Sun Mar 8 10:17:47 GMT 2015

On 2 Mar 2015, at 11:55, Ayush Tomar <ayushtomar at gmail.com> wrote:

> I'm Ayush Tomar, junior undergrad in Computer Science from New Delhi, India. I love C++ coding and working on machine learning and information retrieval project. I was exploring the GSoC ideas for Xapian and the project on "Adding Weighting Schemes" looked really interesting to me. I wanted to work on text mining/IR this summer and this idea seems perfect!

Hi, Ayush — Xapian hasn’t been accepted as a mentoring organisation for GSoC this year. However if you’re interested in working on this (or any other) project outside GSoC then we can still provide the same support we would have done as part of GSoC.

> I have gone through the getting started guide and started to understand of how the Xapian code is connected and identifying which parts I need to focus on for the project. I have started to research what similar and new schemes could be added to Xapian. It'll be great help if someone could suggest a weighting scheme on which I should focus for the entry level task.

Many weighting schemes we’d like to add require tracking more statistics, meaning they aren’t really entry-level things (this may not be true for some of the DfR ones we don’t have as yet). You could perhaps improve branch coverage in tests for the weighting schemes (see http://lcov.xapian.org/latest/weight/index.html); for instance it looks like there are various automatic adjustments of smoothing parameters in LMWeight that aren’t tested under all conditions (see http://lcov.xapian.org/latest/weight/lmweight.cc.gcov.html; anything in orange isn’t being tested, and will require a careful unit test writing to exercise it and ensure it does what it’s supposed to). Alternatively any other small project will get you working with the code (and give you an opportunity to get one or two small commits in, which is valuable in getting familiar with PRs to Xapian, how commits should look &c).

If you aren’t able to get involved with Xapian outside GSoC, then of course you can ignore all of this, but hopefully you’ll be able to in some way either over the summer or at some other time! Just shout out if you need any help.


 James Aylett, occasional trouble-maker

More information about the Xapian-devel mailing list