Strange index consistency issue

Olly Betts olly at survex.com
Sat Jan 9 09:19:24 GMT 2016


On Fri, Jan 08, 2016 at 08:11:48AM +0100, Jean-Francois Dockes wrote:
> A Recoll user is reporting an index corruption problem. In general, index
> corruption happens from time to time with Recoll, because of crashes,
> reboots, misc Recoll bugs, etc.
> 
> The strange thing here is that xapian-check does not seem to detect anything.

Checking the database checks the B-tree structure, checks the contents
of most of the tables makes sense, and does some cross-checking between
tables, but the latter in particular is far from exhaustive.

Looking at the exception message, if it is lacking a trailing '.' (as
quoted below), then a corrupted entry (or chunk) in the list of document
lengths, but if it has a trailing '.', then it's a missing entry in the
record table.  (I'm not sure if this punctuation difference was a
fiendishly cunning deliberate plan or careless inconsistency...)

We probably ought to cross-check the two - that shouldn't be costly to
do.

> This is with Xapian 1.2.16

My guess is that the corruption is caused by the same bug as #645, which
was fixed in 1.2.21.

>     I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of
>     which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of
>     terms, which included 'term' and seemed to be reasonable for a document I
>     then ran "delve -r 6 ./xapiandb -d" and got the following:
> 
>     Data for record #6:
> 
>     Error: DocNotFoundError: Document 6 not found

Hmm, if you're getting it with '-d' there, that makes me suspect a
missing record table entry.

> To repeat, the issue here is not that the index is corrupted, but that
> xapian-check does not see it. Is there some more thorough test which could
> be run ?

You could try:

delve -t '' ./xapiandb

That will list the document lengths, so you can see if document 6 is in
that list or not.

Cheers,
    Olly



More information about the Xapian-discuss mailing list