[Xapian-discuss] Result count before fetch

Olly Betts olly at survex.com
Tue May 19 04:28:23 BST 2009


On Sun, May 17, 2009 at 09:09:34PM +0200, Jesper Krogh wrote:
> Ivo Jansch - Ibuildings wrote:
> > To be able to use the Xapian results in a Zend_Paginator, I would like 
> > to retrieve the amount of matches _before_ I retrieve them using 
> > get_mset. Is this possible?
> > 
> > I thought about using $enquier->get_mset(0,1)->get_matches_estimated()

You can actually ask for no matches (i.e. get_mset(0,0)) and get an
estimate without doing much work at all, but it generally won't be as
accurate as you'll get by actually doing a match, and the estimate tends
to improve the more documents you ask for.

You can look at the bounds to know how wrong it could be.  If the query
is a single term and there's no collapsing or matchdecider or cutoffs,
the estimate will always be exact.

> > but that would mean an extra get_mset call per pageload, which would 
> > seem inefficient.
> 
> Someone with more insight that me might give a better answer, but I 
> think the answer is no, since the estimate is done in the process of
> finding the results.

Indeed - I'm unclear on the scenario, but I'd suggest just reading the
MSet when you want the estimate and storing it in your "Paginator" for
when you want the results.

> It passes through the index, so if it has reached like 10% through to 
> get the 10 matches you requested, then the estimate would be that you 
> would have 100 in the total set, allthough it would not be accurate 
> unless you actuall requested more than was actually available so you
> forced it to visit all hits.

It's somewhat more complex that just scaling up like this, but it's true
that for a given query, considering more documents will tend to improve
the estimate.

See also:

http://trac.xapian.org/wiki/FAQ/MoreAccurateEstimates

Cheers,
    Olly



More information about the Xapian-discuss mailing list