[Xapian-discuss] Matches estimate varies with sorting method

Fabrice Colin fabrice.colin at gmail.com
Wed Oct 17 13:11:05 BST 2007


On 10/17/07, Olly Betts <olly at survex.com> wrote:
> On Tue, Oct 16, 2007 at 09:50:29PM +0800, Fabrice Colin wrote:
> > I found that the figure returned by MSet::get_matches_estimated() varies
> > depending on how results are to be sorted.
>
> This in itself isn't a bug - it is after all an estimate!
>
> > For instance, in my index, value 4 contains date and time in the format
> > "yyyymmddhhmmss". For the same query, the number of results will be
> > estimated to 20000+ when results are first sorted by date and time
> > with set_sort_by_value_then_relevance(4) and to only 100 if I use
> > set_sort_by_relevance(). The first figure is the correct one.
>
> You're likely to get a more accurate estimate when sorting since the
> matcher generally has to consider more documents when sorting.
>
That's fair enough. I am still surprised the figures are so wildly different.

> > Note that the MSet is obtained with Enquire::get_mset(0, 100, 101), so that
> > probably explains where the 100 comes from.
>
> But this sounds wrong.  If "checkatleast" is 101, get_matches_estimated()
> should only be less if the estimate is exact.
>
> What are the corresponding values of get_matches_min() and
> get_matches_max() in the two cases?
>
When sorting by date, the estimate is 20424, the lower bound is 7735
and the upper bound is 40848, which is the number of documents in my index.
When sorting by relevance, all three figures are 100.

The query I am testing with is a range on another value. Does this matter ?

In my particular case, I would actually prefer having an estimate that is
as close as possible to the number of matches.

> Does this also happen with SVN HEAD?  There have been some
> matcher-related changes, but nothing specifically addressing that I'm
> aware of.
>
I haven't tried with SVN head. I'll take a look and report back.

> And can you supply a recipe to reproduce this easily?
>
Okay. I'll try to do that before the end of the week.

Thanks !

Fabrice



More information about the Xapian-discuss mailing list