[Xapian-devel] What does collection_freq means?

Olly Betts olly at survex.com
Wed Aug 28 04:02:01 BST 2013


On Wed, Aug 28, 2013 at 10:13:36AM +0800, jiangwen jiang wrote:
> http://xapian.org/docs/sourcedoc/html/classXapian_1_1Weight_1_1Internal.html
>     Xapian::doccount collection_size
>                               Number of documents in the collection.
>     What's the difference bewteen collection_size and
> doccount(Xapian::doccount  get_doccount() const;).

They're the same thing.

> 2 On this page, http://xapian.org/docs/bm25.html
>     *(k3+1)q*
> (k3+q) ·  *(k1+1)f*
> (k1L+f) ·log *(r+0.5)(N-n-R+r+0.5)*
> (n-r+0.5)(R-r+0.5)   .   f is the wdf, the within document frequency,
> 
>    But in the code BM25Weight::get_maxpart(),
>    double wdf_max(get_wdf_upper_bound()) is used, what's the difference
> between f(wdf) and wdf_max.
>    If they are not the same, why wdf_max is used
> 
> Really appreciate your help!

wdf_max is an upper bound on the wdf of a particular term in any
document in the database.

get_maxpart() returns an upper bound on what get_sumpart() can return
for any document for the term represented by that Weight object.

Cheers,
    Olly



More information about the Xapian-devel mailing list