[Xapian-discuss] Rebuilding corrupt databases from .DB files.

Dan Colish dcolish at gmail.com
Mon Apr 16 16:07:11 BST 2012


On Sun, Apr 15, 2012 at 5:56 PM, Olly Betts <olly at survex.com> wrote:

> On Mon, Apr 16, 2012 at 10:15:08AM +1000, Graham Jones wrote:
> > I have already tried recreating the .baseA and iamchert files from
> > copying similar databases (as these seem to be identical save for the
> > UUID in iamchert) but can't get it to be usable without the .baseB
> > files.
>
> It is safe to create an new "donor" chert database and harvest its
> "iamchert" file (the only problem would be if it was a different version
> of the chert format, but that hasn't changed for ages, and is unlikely
> to in the future).
>
> Copying the .baseA or .baseB files from a different database isn't going
> to work.
>
> > Can someone tell me what is in the .baseB files and if their contents
> > can be recreated from the .DB files if I were to write something that
> > can read and process the files at a low level.
>
> They can be recreated (as in, it is possible to write a tool to do this,
> but no such tool currently exists AFAIK).
>
> Essentially the base file has some header info, and a bitmap of used
> blocks, and then the revision number repeated again - this format is
> described by a comment in backends/chert/chert_btreebase.h.
>
> But you probably don't need to write that yourself - my suggestion would
> be to start from the Btree consistency checking code, which iterates the
> tree from the root block, and compares the actually used blocks against
> those marked as used in the bitmap.  Instead you could iterate and
> create a new bitmap.
>
> That code is in backends/chert/chert_check.cc.
>
> You also need to find the right root block to get you started - this
> isn't entirely trivial to do in general, but you can get a list of
> candidates by scanning all the blocks in the .DB file looking at
> GET_LEVEL() and REVISION().
>
> Naively, the right root is the one with the highest level and revision,
> but the complications are that if the Btree has had deletes and lost a
> level, then it might be the root you want has a lower level than an
> older root block which hasn't yet been reused, and that there may be a
> higher revision number (probably only one higher) on some blocks if
> there was revision which was partly or fully written but not committed.
> If your databases were produced by compaction, then these complications
> aren't a concern.
>
> If you pick a root which isn't the latest, you'll likely fail to find
> a full tree under it - you need to check that you don't hit a child
> block which is newer than its parent as you iterate.  If you don't,
> then Xapian will throw an exception when it tries to use that part of
> the database.
>
> The baseA/baseB difference is just that one is the latest revision and
> one the revision before.  If you're recreating, you can just create a
> set with either name - so long as they're consistent, then baseA vs
> baseB doesn't matter.
>
> If you find the root blocks, you could probably just create a set of
> base files with dummy bitmaps and use copydatabase, but that will be
> slow for that much data, so recreating the bitmaps is probably
> worthwhile.
>
> Once you've recreated a set of base files, try xapian-check on the
> database to makes sure it looks consistent at both the Btree and
> higher levels.
>
> Good luck!
>
> Cheers,
>    Olly
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>

Sounds like you've gotten into quite a tough situation. I hope we can help
you recover this data. I'm really interested in hearing the process, and
outcome of this recovery for future ideas to improve our recovery tools. If
you are able to share updates as you make progress, it would be greatly
appreciated!

--Dan


More information about the Xapian-discuss mailing list