A corrupt index

Wed Nov 17 11:27:39 GMT 2021

Olly writes:

> On Mon, Nov 15, 2021 at 09:45:15PM +0100, Adam Sjøgren wrote:
>>     termlist:
>>     baseA blocksize=8K items=20885830 lastblock=10207839 revision=5434326 levels=3 root=7090
>>     Failed to check B-tree: DatabaseError: Key >= right dividing key in level above
>
> This error is essentially saying that check found a branch node where
> the key which is meant to partition two nodes below doesn't actually do
> so.  Maybe replacing the bad dividing key with one which actually
> partitions the nodes below (assuming they are actually partitioned)
> would given a sensible working database, but it's hard to know without
> trying it.

I have a readily available copy of the index that I can experiment on,
so that would be interesting to try.

Any tips for what to do in hexl-mode in Emacs? Or another way, I guess:

    $ ls -ltrh termlist.*
    -rw-r--r-- 1 db db 1.3M Nov 17 10:57 termlist.baseB
    -rw-r--r-- 1 db db  78G Nov 17 10:57 termlist.DB
    -rw-r--r-- 1 db db 1.3M Nov 17 10:57 termlist.baseA
    $ 

:-)

>>     position:
>>     baseA blocksize=8K items=13525517523 lastblock=118173891 revision=5434326 levels=4 root=6551
>>     Failed to check B-tree: DatabaseError: Table entry count says 13525517523 but actually counted 13525517278
>
> This error is just that the record of how many entries there are is
> wrong (this count is stored since it's useful to know in some cases, and
> expensive to compute by scanning the whole table).  It shouldn't get out
> of step with the actual number of entries in the table, but since no
> other errors are reported just fixing the metadata record seems
> reasonable.

Ok, sounds good. If you have a tip on how to fix the metadata here as
well, I would be grateful for that as well.

[...]

> We recommend migrating off chert so perhaps reindexing is a good plan
> anyway.  I can see it may not be very appealing with a 1.1TB database
> though.

Yeah. Unfortunately the resources aren't really there to migrate, as
this is a long running project (started 2006) which is being replaced
soon(ish).

[...]

> It shouldn't really be possible for the program to cause a corrupt
> database like this (except for program bugs like stray memory writes
> into memory Xapian has allocated, or the program writing to file
> descriptors which Xapian has open for writing on the database, etc).

We're using the Perl bindings, so coming from a somewhat "safe space",
but... we did introduce a new way of making our indexing fail, so...

> However, the way a chert commit happens involves trying to stitch
> together per-table atomic commits to make a per-database atomic commit,
> which means we need to recover if some tables have committed and others
> haven't - that's fiddly to do and we've found bugs there before.  It
> could be you've hit another one maybe.

Unfortunately I can't provide any good information on what/when/how, so
if indeed we did, there isn't anything to go on.

> In glass we replaced this whole mechanism with a new one which gives a
> per-database atomic commit directly.

Nice!

  Thanks for the thorough answer,

    Adam

-- 
 "See? See! Starboard is right! Port is left!"              Adam Sjøgren
 "Ok, so I was wrong for once in my life! Shut up."    asjo at koldfront.dk