[Xapian-discuss] Incorrect get_matches_estimated() of Xapian::Mset

Olly Betts olly at survex.com
Thu Sep 20 15:36:45 BST 2007


On Thu, Sep 20, 2007 at 10:18:07PM +0800, Hightman(??????) wrote:
>   Though I used the third argument to set the check_at_least number, It still return
>   much more number than the exact number(about 5 times);
> 
>   But if the exact number is greater than estimated number, the new return value is more correct.
> 
>   Dosen't the third argument fixed the number greater than exact number?

I don't really understand that question.

What the checkatleast parameter specifies is the minimum number of
documents which the matcher will look at.  By default we try to
minimise this number, while still returning correct results, as
that makes searches faster.

If there are fewer matches than this, then get_matches_estimated(),
get_matches_lower_bound() and get_matches_upper_bound() will all
return the same answer, which will be the exact number of matches.

So if you want to show 10 page buttons and have 10 hits per page,
pass 101 as checkatleast (the extra 1 allows you to tell the
difference between "exactly 100 hits" and "more than 100 hits").

If there are more, then get_matches_estimated() won't necessarily
be exact, though because the matcher may have looked at more documents,
it may be a better estimate.  You can look at get_matches_lower_bound()
and get_matches_upper_bound() to see how wrong it could be.

Note that most search engines estimate the number of matches for
reasons of performance (e.g. Google usually says "Results 1 - 10 of
about 781,000").

Cheers,
    Olly



More information about the Xapian-discuss mailing list