[Xapian-devel] [Developing and Starting with Xapian]

Olly Betts olly at survex.com
Fri Jan 18 06:25:37 GMT 2013


On Thu, Dec 27, 2012 at 01:16:03AM +0530, Abhishek Shah wrote:
> Thanks Dan and Gaurav for the suggestion. It was interesting to read the
> project ideas and weighting schemes and learning to rank seemed
> interesting. I was familiar with unigram language modeling and BM25 and got
> a little more familiarity with the bigram language modeling referring to
> the project plan of Gaurav. I have gone through the basic project idea
> which tells that I need to implement some other weighting schemes like
> Divergence From Randomness. I have gone through the tutorial of Divergence
> from randomness and understood the theory. I would like to code and try out
> different urn randomness models. Can you suggest as to how to proceed for
> contributing in the weighting schemes project idea?

Sorry this didn't get a response earlier (I missed it until now due to
the holidays).  I think we may have discussed it on IRC since, but
here's a response which will help those searching the list archives in
the future even if it's no longer useful to you.

There's a example of writing your own simple weighting scheme here:

http://getting-started-with-xapian.readthedocs.org/en/latest/howtos/weighting_scheme.html#custom-weighting-schemes

The part this doesn't currently cover is that if you want to use
various statistics in your subclass you need to tell Xapian so it
knows to make them available.  You do this by calling need_stat()
in the constructor of your subclass, with the enum value(s) here:

http://trac.xapian.org/browser/trunk/xapian-core/include/xapian/weight.h#L36

For example, this is what BM25Weight does:

http://trac.xapian.org/browser/trunk/xapian-core/include/xapian/weight.h#L396

Cheers,
    Olly



More information about the Xapian-devel mailing list