[Xapian-devel] GSOC 2015

James Aylett james-xapian at tartarus.org
Sun Mar 8 10:43:20 GMT 2015


On 25 Feb 2015, at 18:36, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:

> For GSOC 2015, I would like to work on Heimstra's language modelling and LDA based relavance language modelling for the project idea 'Weighting schemes for Xapian’.

Richhiey — apologies for not replying to you sooner. Xapian hasn’t been accepted as a mentoring organisation for GSoC this year; however we’re still happy to provide the same mentoring and support for anyone who wants to work on Xapian this summer (or at any time), so if you’re able and still interested, it’d be great to develop these weighting schemes into a concrete plan so you can work on them.

The key here is going to be identifying any information that will be needed while processing the weight of a document for a particular query which we don’t currently track. I haven’t had a chance to look more than briefly at the two papers; for LDA it looks like there’s going to be something around the topic distributions (assuming this approach can be coerced into a suitable shape for Xapian’s weighting mechanism); for the parsinomious LM it looks like there’s at least some post-index iterative calculation, with related storage requirements. (It may be that going back to the work that introduced parsimonious language modelling, which I assume is  the Sparck-Jones et al paper[1], will suggest other specific approaches within Xapian’s framework.)

[1] K. Sparck-Jones, S.E. Robertson, D. Hiemstra, and H. Zaragoza: Language modelling and relevance (http://www.cl.cam.ac.uk/archive/ksj21/ksjdigipapers/langmodbook03.pdf)

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-devel mailing list