[Xapian-discuss] Rebuilding corrupt databases from .DB files.

Olly Betts olly at survex.com
Mon Apr 16 01:56:35 BST 2012


On Mon, Apr 16, 2012 at 10:15:08AM +1000, Graham Jones wrote:
> I have already tried recreating the .baseA and iamchert files from
> copying similar databases (as these seem to be identical save for the
> UUID in iamchert) but can't get it to be usable without the .baseB
> files.

It is safe to create an new "donor" chert database and harvest its
"iamchert" file (the only problem would be if it was a different version
of the chert format, but that hasn't changed for ages, and is unlikely
to in the future).

Copying the .baseA or .baseB files from a different database isn't going
to work.

> Can someone tell me what is in the .baseB files and if their contents
> can be recreated from the .DB files if I were to write something that
> can read and process the files at a low level.

They can be recreated (as in, it is possible to write a tool to do this,
but no such tool currently exists AFAIK).

Essentially the base file has some header info, and a bitmap of used
blocks, and then the revision number repeated again - this format is
described by a comment in backends/chert/chert_btreebase.h.

But you probably don't need to write that yourself - my suggestion would
be to start from the Btree consistency checking code, which iterates the
tree from the root block, and compares the actually used blocks against
those marked as used in the bitmap.  Instead you could iterate and
create a new bitmap.

That code is in backends/chert/chert_check.cc.

You also need to find the right root block to get you started - this
isn't entirely trivial to do in general, but you can get a list of
candidates by scanning all the blocks in the .DB file looking at
GET_LEVEL() and REVISION().

Naively, the right root is the one with the highest level and revision,
but the complications are that if the Btree has had deletes and lost a
level, then it might be the root you want has a lower level than an
older root block which hasn't yet been reused, and that there may be a
higher revision number (probably only one higher) on some blocks if
there was revision which was partly or fully written but not committed.
If your databases were produced by compaction, then these complications
aren't a concern.

If you pick a root which isn't the latest, you'll likely fail to find
a full tree under it - you need to check that you don't hit a child
block which is newer than its parent as you iterate.  If you don't,
then Xapian will throw an exception when it tries to use that part of
the database.

The baseA/baseB difference is just that one is the latest revision and
one the revision before.  If you're recreating, you can just create a
set with either name - so long as they're consistent, then baseA vs
baseB doesn't matter.

If you find the root blocks, you could probably just create a set of
base files with dummy bitmaps and use copydatabase, but that will be
slow for that much data, so recreating the bitmaps is probably
worthwhile.

Once you've recreated a set of base files, try xapian-check on the
database to makes sure it looks consistent at both the Btree and
higher levels.

Good luck!

Cheers,
    Olly



More information about the Xapian-discuss mailing list