A corrupt index

Olly Betts olly at survex.com
Thu Nov 18 19:57:38 GMT 2021


On Wed, Nov 17, 2021 at 12:27:39PM +0100, Adam Sjøgren wrote:
> Olly writes:
> 
> > On Mon, Nov 15, 2021 at 09:45:15PM +0100, Adam Sjøgren wrote:
> >>     termlist:
> >>     baseA blocksize=8K items=20885830 lastblock=10207839 revision=5434326 levels=3 root=7090
> >>     Failed to check B-tree: DatabaseError: Key >= right dividing key in level above
> >
> > This error is essentially saying that check found a branch node where
> > the key which is meant to partition two nodes below doesn't actually do
> > so.  Maybe replacing the bad dividing key with one which actually
> > partitions the nodes below (assuming they are actually partitioned)
> > would given a sensible working database, but it's hard to know without
> > trying it.
> 
> I have a readily available copy of the index that I can experiment on,
> so that would be interesting to try.
> 
> Any tips for what to do in hexl-mode in Emacs? Or another way, I guess:
> 
>     $ ls -ltrh termlist.*
>     -rw-r--r-- 1 db db 1.3M Nov 17 10:57 termlist.baseB
>     -rw-r--r-- 1 db db  78G Nov 17 10:57 termlist.DB
>     -rw-r--r-- 1 db db 1.3M Nov 17 10:57 termlist.baseA

The .DB file is full of blocks, which are 8KB unless you've specified
otherwise explicitly.  The format of the blocks is described in comments
in backends/chert/chert_table.cc and .h.

However, it occurs to me you could try xapian-compact on the bad
database (if you didn't already).  That reads the entries in the table
sequentially for which the dividing keys don't matter, so it may give
you a good copy - I'd try that first.

> >>     position:
> >>     baseA blocksize=8K items=13525517523 lastblock=118173891 revision=5434326 levels=4 root=6551
> >>     Failed to check B-tree: DatabaseError: Table entry count says 13525517523 but actually counted 13525517278
> >
> > This error is just that the record of how many entries there are is
> > wrong (this count is stored since it's useful to know in some cases, and
> > expensive to compute by scanning the whole table).  It shouldn't get out
> > of step with the actual number of entries in the table, but since no
> > other errors are reported just fixing the metadata record seems
> > reasonable.
> 
> Ok, sounds good. If you have a tip on how to fix the metadata here as
> well, I would be grateful for that as well.

Compacting would fix this count too, but it's stored in the base file
(baseA is the active one from the output above) and see comments in
backends/chert/chert_btreebase.cc for the format of that.

Cheers,
    Olly



More information about the Xapian-discuss mailing list