[Xapian-discuss] Xapian performance on gmane.org compared
Olly Betts
olly at survex.com
Fri Aug 28 12:16:44 BST 2009
On Fri, Aug 28, 2009 at 10:32:07AM +0200, Henry wrote:
> Quoting "Olly Betts" <olly at survex.com>:
>> As document on http://search.gmane.org, it's chert.
>>
>>> What's the DB size on disk?
>>
>> 138GB.
>
> That leaves me scratching my head: performing the same phrase search
> should then be a lot quicker on my DB which is only 4GB. The number of
> hits I understand will impact the performance, but still...
As I said in a bit you didn't quote, currently the gmane search doesn't
use positional information. So a phrase search is actually just an AND
search.
What other settings are you using? If the primary ordering is by
value(s), that won't be helping.
>> As Richard says, my patch in #394 should help, but note that you can
>> tune the size of the "pond" by setting POND_SIZE in the environment.
>> The default is 100000 which was sane for the situation I wrote it for,
>> but higher or lower might be better (and I'd be interested to hear what
>> works best for other situations so we can set it sanely automatically).
>> There's no benefit in setting it higher than the number of documents
>> matched by the AND query of the terms in the phrase.
>
> Yes, I gave the patch a swing, and it halved the search time to ~15s -
> still confusing and terrible compared to the ~4s returned on 'rain'.
The patch could be improved (e.g. a min/max heap should do better in
time and space than the multimap currently used). But I suspect we
really need to do less I/O to make big savings here.
Just to check, if you run the test queries under "time", how do the user
and system times compare with the "real" time?
> The number of docs matched in my query is only about 13k. Based on your
> last comment, tweaking POND_SIZE will have no affect.
Raising it shouldn't, but lowering it could be better - it's a bit hard to
predict exactly how things will change as the size is changed as there
are several potential effects.
> Looks like the existing default of 100,000 is indeed the sweet-spot.
That's useful to know, thanks.
Cheers,
Olly
More information about the Xapian-discuss
mailing list