[Xapian-discuss] Revision 11671 cursory observations wrt sort performance

Sun Dec 7 23:57:50 GMT 2008

On Sat, Dec 06, 2008 at 06:21:39PM +0200, Henry wrote:
> Quoting myself:
> > b)  *Is* Xapian sorting through all 11-15k results above?  With
> > performance an issue when sorting, I wonder:  I seem to vaguely recall
> > an index search approach which roughly did the following:  since the
> > user will only ever possibly view (say) 1000 results, why bother
> > grinding through all 1 million results (or 10-15k in my tests above)
> > to sort, etc?  ie, only gather and collate those results (say, 1000)
> > with the highest scores (or those which have a particular 'field'
> > above a certain threshold), discarding the rest, but still returning a
> > "hit" total of X for display/informational purposes only... or is
> > Xapian already doing this?
> 
> Apologies for answering myself:  if I understand the docs correctly,  
> Enquire::get_mset() looks like the way to go.  It seems to estimate  
> totals without considering all matches

By default it does the least work it can to return the set of results
requested.

> but calling the decision  
> function mdecider() for every match sounds expensive (in Perl at least).

I suspect calling back to Perl has a reasonable overhead, but I've not
measured it.  At some point the MatchDecider call will be done more
lazily but that's not yet been implemented.

But I don't see how this helps at all.  Firstly, how do you know
beforehand what threshold to choose?  Also, you're still going to be
checking every document to see if it is above or below the threshold
so you might as well let the matcher do that...

Cheers,
    Olly