[Xapian-discuss] Xapian performance on gmane.org compared

Olly Betts olly at survex.com
Fri Aug 28 12:16:44 BST 2009


On Fri, Aug 28, 2009 at 10:32:07AM +0200, Henry wrote:
> Quoting "Olly Betts" <olly at survex.com>:
>> As document on http://search.gmane.org, it's chert.
>>
>>> What's the DB size on disk?
>>
>> 138GB.
>
> That leaves me scratching my head:  performing the same phrase search  
> should then be a lot quicker on my DB which is only 4GB.  The number of 
> hits I understand will impact the performance, but still...

As I said in a bit you didn't quote, currently the gmane search doesn't
use positional information.  So a phrase search is actually just an AND
search.

What other settings are you using?  If the primary ordering is by
value(s), that won't be helping.

>> As Richard says, my patch in #394 should help, but note that you can
>> tune the size of the "pond" by setting POND_SIZE in the environment.
>> The default is 100000 which was sane for the situation I wrote it for,
>> but higher or lower might be better (and I'd be interested to hear what
>> works best for other situations so we can set it sanely automatically).
>> There's no benefit in setting it higher than the number of documents
>> matched by the AND query of the terms in the phrase.
>
> Yes, I gave the patch a swing, and it halved the search time to ~15s -  
> still confusing and terrible compared to the ~4s returned on 'rain'.

The patch could be improved (e.g. a min/max heap should do better in
time and space than the multimap currently used).  But I suspect we
really need to do less I/O to make big savings here.

Just to check, if you run the test queries under "time", how do the user
and system times compare with the "real" time?

> The number of docs matched in my query is only about 13k.  Based on your 
> last comment, tweaking POND_SIZE will have no affect.

Raising it shouldn't, but lowering it could be better - it's a bit hard to
predict exactly how things will change as the size is changed as there
are several potential effects.

> Looks like the existing default of 100,000 is indeed the sweet-spot.

That's useful to know, thanks.

Cheers,
    Olly



More information about the Xapian-discuss mailing list