[Xapian-devel] GSOC 2015
James Aylett
james-xapian at tartarus.org
Sun Mar 8 10:43:20 GMT 2015
On 25 Feb 2015, at 18:36, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:
> For GSOC 2015, I would like to work on Heimstra's language modelling and LDA based relavance language modelling for the project idea 'Weighting schemes for Xapian’.
Richhiey — apologies for not replying to you sooner. Xapian hasn’t been accepted as a mentoring organisation for GSoC this year; however we’re still happy to provide the same mentoring and support for anyone who wants to work on Xapian this summer (or at any time), so if you’re able and still interested, it’d be great to develop these weighting schemes into a concrete plan so you can work on them.
The key here is going to be identifying any information that will be needed while processing the weight of a document for a particular query which we don’t currently track. I haven’t had a chance to look more than briefly at the two papers; for LDA it looks like there’s going to be something around the topic distributions (assuming this approach can be coerced into a suitable shape for Xapian’s weighting mechanism); for the parsinomious LM it looks like there’s at least some post-index iterative calculation, with related storage requirements. (It may be that going back to the work that introduced parsimonious language modelling, which I assume is the Sparck-Jones et al paper[1], will suggest other specific approaches within Xapian’s framework.)
[1] K. Sparck-Jones, S.E. Robertson, D. Hiemstra, and H. Zaragoza: Language modelling and relevance (http://www.cl.cam.ac.uk/archive/ksj21/ksjdigipapers/langmodbook03.pdf)
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Xapian-devel
mailing list