[Xapian-devel] GSoC 2011 Weighting Schemes

Olly Betts olly at survex.com
Mon Apr 4 03:07:39 BST 2011


On Wed, Mar 30, 2011 at 08:35:14PM +0800, wuwenjin wrote:
> *Q1:* what is the purpose of "
> virtual Xapian::weight get_maxpart() const = 0;
> " and "
>  virtual Xapian::weight get_maxextra() const = 0;
> 
>  " ? when do these methods be called ?

If we have bounds on the components of the weight (which are true for
any document in the database being searched), then we can perform
various optimisation based on the weights of documents we have already
seen.

One of the simpler examples: if we are ordering by relevance, as the
match progresses the minimum weight needed to make it into the result
set rises.  If the query is an OR, the at some point we know that both
sides will need to match to give us a large enough weight, and we can
change the OR to an AND.

> *Q2:* In Xapian, BM25Weight is the fault weighting method. I want to know
> when and where and how
> BM25Weight  is used in Xapian's source code?  maybe this question involved
> many codes. I think that Weighting  happens after submitting query terms,
> and  during the match. for example in  "multimatch.cc
>  void
>  MultiMatch::get_mset(...)"??? but this method is quite complex. I am not sure
> about it.

Each term becomes a leaf node of the postlist tree, and has a Weight
object associated with it.  If get_maxpart() > 0, there's also a
Weight object which contributes get_sumpart().

Cheers,
    Olly



More information about the Xapian-devel mailing list