[Xapian-devel] DFR framework as a GSOC project

aarsh shah aarshkshah1992 at gmail.com
Fri Mar 15 16:32:24 GMT 2013


Hey guys,hi.:) I've finished implementing the PL2 scheme . The bounds I
have implemented for it are as good as I could, given the nature of the
scheme and my mathematical skills.However,tight bounds for other named DFR
schemes will  be easier to implement because their forumlas are quite
simpler compared to PL2 . Will send in a  pull request in a couple of days
once I'm done with the tests and the documentation.

I'll now start working on the DPH scheme as described in Section 3 here:-
http://trec.nist.gov/pubs/trec18/papers/uglasgow.BLOG.ENT.MQ.RF.WEB.pdf

Now that GSOC is coming near,I want to start working on my proposal to make
it as detailed as possible  and my aim is to implement document weighting
and query expansion using the DFR Framework(currently,we have a hard coded
formula for query expansion). I hope to complete as many named DFR schemes
as I can before the application period starts so that during GSOC , I can
focus on implementing the DFR Framework which will allow the user to create
any DFR scheme that he wants to and also implement Query Expansion using
the DFR Framework .I hope  to be able to do the following work by the end
of GSOC:-

1.) About 8 named frequently used DFR schemes mentioned on the terrier
homepage and those mentioned by Olly on IRC.Each of these will be an
independent weighting scheme subclassed from Xapian::Weight .
2.) A DFR framework which allows the creation of any DFR scheme by choosing
a probablistic model,a risk gain normalization and a term frequency
normalization.
3/) Implement Query Expansion by using the DFR schemes and allow the user
to choose any named scheme to expand the query (just like we do for MSet.)

II hope to able to finish at least 50 % of 1.) before April ends so that I
can focus on 2.) and 3.) during the summer.

Please do comment on this and let me know what you think.Also,I have no
prior experience with writing proposals and so,please can you tell me what
a proposal for something like this should include ? I'd really appreciate
your help.

-Regards
-Aarsh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130315/1af4e8df/attachment.htm>


More information about the Xapian-devel mailing list