[Xapian-devel] dividing B-trees

Wed Dec 29 17:57:49 GMT 2004

[I originally sent this mail to Martin and Richard, but then realised
it's of wider interest, so I'm resending it to the list with Martin's
reply attached (with his consent)]

I was talking to Richard about trying to use shortened dividing keys in
the B-trees.  Shorter keys are better because you can then fit more in
each block, so you need fewer B-tree levels for the same data.

Quartz only shortens keys when going from the leaf level to the next
level above, which Richard and I thought was a missed opportunity.  But
I tried doing this for higher (if the root is the top) levels, but it
doesn't work because there may be a lower sorting key which is the wrong
side of the truncated key.  For example:

         |   mat   |   muz   |                
        /          |          \
       /           |           \
      |            |            | 
      V            V            V
[... mad] [matter ... muscat] [muzzle ...]   <--(keys in leaf blocks)

If we try to truncate a key to separate "mat" and "muz", we get "mu",
but this divides "matter" and "muscat" which are in the same block.
The higher level dividing key must match a leaf level block division,
which is clearly why the existing code does what it does!

But what if we approach this from the other direction.  Say we have a
leaf block "[matter ... muscat muzzle ... needy]" which we want to
split.  Instead of splitting as close to halfway as we can (which it
turns out is between "muscat" and "muzzle"), we try to find a split
point with a short dividing key (but also one as close to halfway as we
can manage given this primary criterion).  So in this case, our dividing
key would be "n" instead of "muz".

This seems plausible, except it creates blocks less than half full.
I've not looked at this myself, but Richard said something about
the proof of the complexity of B-tree operations assumed blocks were
always half-full (or something like that).

What if we always ensure that we pick a key such that they're at least
1/3 full, say?  So we look for the best split point in the middle 1/3,
which might be "mu" instead of "n", but that's still shorter than "muz".

Except that in sequential mode (which we will usually be in for large
database builds) we split at the insertion point.

Actually, sometimes it may be worth splitting slightly before the
insertion point.

Say that we can split at the insertion point and get a 240 character
dividing key, or one item back to get a 1 character dividing key.  If we
split at the insertion point, we can expect to use an extra 239 bytes
per level in the B-tree, whereas by splitting one earlier, we waste the
size of the item which we could fit here, but choose not to.  If the
tags are small and the Btree has several levels, that could easily be a
win.  But this example is rather extreme...

Or perhaps there's some other cunning way to go about this...

Cheers,
    Olly
-------------- next part --------------
An embedded message was scrubbed...
From: martin.porter at grapeshot.co.uk (Martin Porter)
Subject: Re: dividing B-trees
Date: Thu, 16 Dec 2004 21:30:28 GMT
Size: 1747
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20041229/5c2e6792/attachment.mht>