[Xapian-discuss] Matches estimate varies with sorting method

Olly Betts olly at survex.com
Wed Oct 17 01:07:36 BST 2007


On Tue, Oct 16, 2007 at 09:50:29PM +0800, Fabrice Colin wrote:
> I found that the figure returned by MSet::get_matches_estimated() varies
> depending on how results are to be sorted.

This in itself isn't a bug - it is after all an estimate!

> For instance, in my index, value 4 contains date and time in the format
> "yyyymmddhhmmss". For the same query, the number of results will be
> estimated to 20000+ when results are first sorted by date and time
> with set_sort_by_value_then_relevance(4) and to only 100 if I use
> set_sort_by_relevance(). The first figure is the correct one.

You're likely to get a more accurate estimate when sorting since the
matcher generally has to consider more documents when sorting.

> Note that the MSet is obtained with Enquire::get_mset(0, 100, 101), so that
> probably explains where the 100 comes from.

But this sounds wrong.  If "checkatleast" is 101, get_matches_estimated()
should only be less if the estimate is exact.

What are the corresponding values of get_matches_min() and
get_matches_max() in the two cases?

Does this also happen with SVN HEAD?  There have been some
matcher-related changes, but nothing specifically addressing that I'm
aware of.

And can you supply a recipe to reproduce this easily?

> The estimate will also be correct with set_sort_by_relevance_then_value(4).
> 
> If I am not mistaken, a similar problem was reported, and apparently fixed,
> back in September :
> http://comments.gmane.org/gmane.comp.search.xapian.general/5110
> 
> I am using 1.0.3.

That fix would have made it into 1.0.3, so I don't think it can be the
exact same issue.

Cheers,
    Olly



More information about the Xapian-discuss mailing list