[Xapian-devel] What does collection_freq means?
Olly Betts
olly at survex.com
Wed Aug 28 04:02:01 BST 2013
On Wed, Aug 28, 2013 at 10:13:36AM +0800, jiangwen jiang wrote:
> http://xapian.org/docs/sourcedoc/html/classXapian_1_1Weight_1_1Internal.html
> Xapian::doccount collection_size
> Number of documents in the collection.
> What's the difference bewteen collection_size and
> doccount(Xapian::doccount get_doccount() const;).
They're the same thing.
> 2 On this page, http://xapian.org/docs/bm25.html
> *(k3+1)q*
> (k3+q) · *(k1+1)f*
> (k1L+f) ·log *(r+0.5)(N-n-R+r+0.5)*
> (n-r+0.5)(R-r+0.5) . f is the wdf, the within document frequency,
>
> But in the code BM25Weight::get_maxpart(),
> double wdf_max(get_wdf_upper_bound()) is used, what's the difference
> between f(wdf) and wdf_max.
> If they are not the same, why wdf_max is used
>
> Really appreciate your help!
wdf_max is an upper bound on the wdf of a particular term in any
document in the database.
get_maxpart() returns an upper bound on what get_sumpart() can return
for any document for the term represented by that Weight object.
Cheers,
Olly
More information about the Xapian-devel
mailing list