[Xapian-discuss] Filesystems
Olly Betts
olly at survex.com
Tue Jul 14 05:47:10 BST 2009
On Mon, Jul 13, 2009 at 04:59:04PM +0100, Richard Boulton wrote:
> 2009/7/13 Arjen van der Meijden <acmmailing at tweakers.net>:
> > database run1 run2
> > non-compacted 104.8 105.4
> > fuller compact 99.5 100.5
> > block size 2kb 135.8 136.5
> > block size 4kb 136.5 133.9
> > block size 8kb 100.9 101.6
> > block size 16kb 82.1 81.8
> > block size 32kb 79.4 79.0
My own tests a while ago of different blocksizes on a more modestly
spec-ed machine suggested there were small gains to be had for a
larger blocksizes, but decreasingly so - 32K wasn't much better than
16K and 64K was pretty much the same as 32K. I failed to find the
analysed times, so sorry for the vague summary.
> Coo, that's quite a difference. Well worth knowing about. I'd be
> fascinated to know whether your older hardware has a significantly
> different sweet-spot - I'd expect smaller blocks to result in a bit
> less CPU, since skipping through runs in the blocks is currently a bit
> expensive (more expensive than I'd like!).
You're confusing block size and chunksize. The postlist chunksize is
always a bit under 2K currently (largely so we should be able to fit 4
in an 8K block), which probably partly explains why <8K is slower.
It's possible that tweaking the chunksize would help - the idea is that
the key and other block overhead should fit in the "spare" 48 * 4 bytes
but I've not done a lot of testing of that. As the comment above it
notes, we can go a few bytes over this limit (we move to a new chunk
when we reach or exceed this limit):
backends/flint/flint_postlist.cc:const unsigned int CHUNKSIZE = 2000;
As Richard says, skipping within a run has linear cost, so making this
larger with a larger blocksize may not be an overall win. Or it might
be...
In the chert backend in 1.1.x, the postlists no longer contain the
document length, so there are more entries in the same sized chunk
so the sweet spot is likely to be different.
Cheers,
Olly
More information about the Xapian-discuss
mailing list