[Xapian-devel] dividing B-trees

Olly Betts olly at survex.com
Thu Dec 30 15:39:02 GMT 2004


Martin Porter writes:
> Your idea of splitting blocks at points which generate small index keys is
> ingenious. I suppose one is sacrificing data space to get tighter index
> space, which would lead to faster retrieval -- the customary tradeoff of
> space for time. 

It may not even be a trade - it's conceivable that we can save more on
index space than we sacrifice on data space, especially in tables where
the key names are long and the tags fairly short (this is largely true
for of the position table).

The other point I've realised is that in the postlist table we insert
throughout the table even when we're just appending documents to the
database (which is a very common case we want to handle especially
well).  In this case, choosing the split point to fill the block right
now is probably futile, as it won't be optimal once more tags have been
inserted.

So perhaps I should start by trying this out on the non-sequential
addition case, and see what difference it makes to the postlist table
size and structure.

Cheers,
    Olly




More information about the Xapian-devel mailing list