[Xapian-discuss] Filesystems

Olly Betts olly at survex.com
Tue Jul 14 05:47:10 BST 2009


On Mon, Jul 13, 2009 at 04:59:04PM +0100, Richard Boulton wrote:
> 2009/7/13 Arjen van der Meijden <acmmailing at tweakers.net>:
> > database        run1    run2
> > non-compacted   104.8   105.4
> > fuller compact  99.5    100.5
> > block size 2kb  135.8   136.5
> > block size 4kb  136.5   133.9
> > block size 8kb  100.9   101.6
> > block size 16kb  82.1    81.8
> > block size 32kb  79.4    79.0

My own tests a while ago of different blocksizes on a more modestly
spec-ed machine suggested there were small gains to be had for a
larger blocksizes, but decreasingly so - 32K wasn't much better than
16K and 64K was pretty much the same as 32K.  I failed to find the
analysed times, so sorry for the vague summary.

> Coo, that's quite a difference.  Well worth knowing about.  I'd be
> fascinated to know whether your older hardware has a significantly
> different sweet-spot - I'd expect smaller blocks to result in a bit
> less CPU, since skipping through runs in the blocks is currently a bit
> expensive (more expensive than I'd like!).

You're confusing block size and chunksize.  The postlist chunksize is
always a bit under 2K currently (largely so we should be able to fit 4
in an 8K block), which probably partly explains why <8K is slower.

It's possible that tweaking the chunksize would help - the idea is that
the key and other block overhead should fit in the "spare" 48 * 4 bytes
but I've not done a lot of testing of that.  As the comment above it
notes, we can go a few bytes over this limit (we move to a new chunk
when we reach or exceed this limit):

backends/flint/flint_postlist.cc:const unsigned int CHUNKSIZE = 2000;

As Richard says, skipping within a run has linear cost, so making this
larger with a larger blocksize may not be an overall win.  Or it might
be...

In the chert backend in 1.1.x, the postlists no longer contain the
document length, so there are more entries in the same sized chunk
so the sweet spot is likely to be different.

Cheers,
    Olly



More information about the Xapian-discuss mailing list