[Xapian-devel] postlist chunking

Olly Betts olly at survex.com
Thu Aug 26 18:08:36 BST 2004


On Mon, Aug 23, 2004 at 03:03:20PM +0100, Olly Betts wrote:
> But how about calculating the max tag size taking into account the true
> key length and imposing the threshold exactly?
> 
> Doing this actually appears to be noticeably worse than 2000, though a
> bit better than 2048.  Not quite sure why - maybe random factors from
> packing shorter tags make as much difference, or maybe I was off by one
> (or a few) in my size calculations (I've double checked them, but
> perhaps I misunderstood the structure).

I talked about this with Richard Boulton (in person) and he suggested
that the problem is that there will typically be lots of small key/tag
pairs and that chunking to exactly fill blocks can end up being
counterproductive because these small pairs can then easily cause a
chunk to just fail to fit at the end of the block.

That seems plausible, and 2048 is definitely a poor threshold, so I
think I'll just go with 2000 for now.  We should revisit this later and
probably get the postlist chunking and the lower level tag splitting to
actual talk to each other.

Cheers,
    Olly




More information about the Xapian-devel mailing list