[Xapian-devel] GSoC 2011 Weighting Schemes
Olly Betts
olly at survex.com
Mon Apr 4 03:07:39 BST 2011
On Wed, Mar 30, 2011 at 08:35:14PM +0800, wuwenjin wrote:
> *Q1:* what is the purpose of "
> virtual Xapian::weight get_maxpart() const = 0;
> " and "
> virtual Xapian::weight get_maxextra() const = 0;
>
> " ? when do these methods be called ?
If we have bounds on the components of the weight (which are true for
any document in the database being searched), then we can perform
various optimisation based on the weights of documents we have already
seen.
One of the simpler examples: if we are ordering by relevance, as the
match progresses the minimum weight needed to make it into the result
set rises. If the query is an OR, the at some point we know that both
sides will need to match to give us a large enough weight, and we can
change the OR to an AND.
> *Q2:* In Xapian, BM25Weight is the fault weighting method. I want to know
> when and where and how
> BM25Weight is used in Xapian's source code? maybe this question involved
> many codes. I think that Weighting happens after submitting query terms,
> and during the match. for example in "multimatch.cc
> void
> MultiMatch::get_mset(...)"??? but this method is quite complex. I am not sure
> about it.
Each term becomes a leaf node of the postlist tree, and has a Weight
object associated with it. If get_maxpart() > 0, there's also a
Weight object which contributes get_sumpart().
Cheers,
Olly
More information about the Xapian-devel
mailing list