[Xapian-devel] GSoC 2011 Weighting Schemes

Olly Betts olly at survex.com
Mon Mar 28 06:28:47 BST 2011


On Mon, Mar 28, 2011 at 10:24:30AM +0800, wuwenjin wrote:
> I have great interested in "Weight Schemes" project. and in the last few
> days I have learnt some detail about DFR model family by reading some papers
> and web page. I find that Terrier Project (http://terrier.org/)  has
> implement most of DFR scheme in Java language

Yes, Terrier implements a number of DfR schemes.

> and  briefly read related
> source of Terrier's package( org.terrier.matching.models), I think "weight
> scheme" can imitate that package, of course in C++.

Xapian's weighting schemes are structured in a particular way to allow
for various optimisations.  You need to subclass Xapian::Weight and
implement various methods to implement them, so I would suggest just
starting from the formulae - trying to directly translate Terrier's code
to C++ will give you Java-esque C++ code which doesn't actually fit where
you need it.

You can see the Xapian::Weight API here:

http://trac.xapian.org/browser/trunk/xapian-core/include/xapian/weight.h

> It will be better to
> implement a generic DFR  weighting model allowing any DFR to be generated
> and evaluated. Since DFR is a framework  or model family, which contains
> many basic models and different normalizations.

An interesting idea.

Although it's a family of weighting schemes, I suspect you'd find you
ended up switching between different implementations for each DfR scheme
internally, and that's better done by subclassing really.

But probably worth thinking about further.

Cheers,
    Olly



More information about the Xapian-devel mailing list