[Xapian-discuss] Xapian performance on gmane.org compared
Henry
henka at cityweb.co.za
Fri Aug 28 09:32:07 BST 2009
Quoting "Olly Betts" <olly at survex.com>:
> As document on http://search.gmane.org, it's chert.
>
>> What's the DB size on disk?
>
> 138GB.
That leaves me scratching my head: performing the same phrase search
should then be a lot quicker on my DB which is only 4GB. The number
of hits I understand will impact the performance, but still...
>> How many search servers is gmane.org using? Their approx. spec?
>
> One, which also handles indexing - see "rain" in the list here:
>
> http://gmane.org/host.php
Once again; big head-scratcher: our machine is probably a few times
faster searching a test/sample DB which is 34x smaller. Something
doesn't add up.
> As Richard says, my patch in #394 should help, but note that you can
> tune the size of the "pond" by setting POND_SIZE in the environment.
> The default is 100000 which was sane for the situation I wrote it for,
> but higher or lower might be better (and I'd be interested to hear what
> works best for other situations so we can set it sanely automatically).
> There's no benefit in setting it higher than the number of documents
> matched by the AND query of the terms in the phrase.
Yes, I gave the patch a swing, and it halved the search time to ~15s -
still confusing and terrible compared to the ~4s returned on 'rain'.
The number of docs matched in my query is only about 13k. Based on
your last comment, tweaking POND_SIZE will have no affect.
Urgh! I wish I knew what's going on. As a final comment FYI:
All using patch from #394.
Test1
1xterm Phrase query Match size: ~3,000
POND_SIZE: 10,000: 2.50s
POND_SIZE: 25,000: 2.49s
POND_SIZE: 50,000: 2.49s
POND_SIZE: 100,000: 2.46s
POND_SIZE: 200,000: 2.48s
Test2
2xterm Phrase query Match size: ~47,000
POND_SIZE: 10,000: 22s
POND_SIZE: 25,000: 59s
POND_SIZE: 50,000: 21s
POND_SIZE: 100,000: 23s
POND_SIZE: 200,000: 197s
Test3
2xterm Phrase query Match size: ~1,700
POND_SIZE: 10,000: 6s
POND_SIZE: 25,000: 18s
POND_SIZE: 50,000: 7s
POND_SIZE: 100,000: 6s
POND_SIZE: 200,000: 6s
Test4
3xterm Phrase query Match size: ~13,500
POND_SIZE: 10,000: 8.4s
POND_SIZE: 25,000: 8.3s
POND_SIZE: 50,000: 8.3s
POND_SIZE: 100,000: 8.0s
POND_SIZE: 200,000: 8.3s
Looks like the existing default of 100,000 is indeed the sweet-spot.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: PGP Digital Signature
Url : http://lists.xapian.org/pipermail/xapian-discuss/attachments/20090828/6ecfebe1/attachment.pgp
More information about the Xapian-discuss
mailing list