Strange index consistency issue
olly at survex.com
Sat Jan 9 09:19:24 GMT 2016
On Fri, Jan 08, 2016 at 08:11:48AM +0100, Jean-Francois Dockes wrote:
> A Recoll user is reporting an index corruption problem. In general, index
> corruption happens from time to time with Recoll, because of crashes,
> reboots, misc Recoll bugs, etc.
> The strange thing here is that xapian-check does not seem to detect anything.
Checking the database checks the B-tree structure, checks the contents
of most of the tables makes sense, and does some cross-checking between
tables, but the latter in particular is far from exhaustive.
Looking at the exception message, if it is lacking a trailing '.' (as
quoted below), then a corrupted entry (or chunk) in the list of document
lengths, but if it has a trailing '.', then it's a missing entry in the
record table. (I'm not sure if this punctuation difference was a
fiendishly cunning deliberate plan or careless inconsistency...)
We probably ought to cross-check the two - that shouldn't be costly to
> This is with Xapian 1.2.16
My guess is that the corruption is caused by the same bug as #645, which
was fixed in 1.2.21.
> I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of
> which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of
> terms, which included 'term' and seemed to be reasonable for a document I
> then ran "delve -r 6 ./xapiandb -d" and got the following:
> Data for record #6:
> Error: DocNotFoundError: Document 6 not found
Hmm, if you're getting it with '-d' there, that makes me suspect a
missing record table entry.
> To repeat, the issue here is not that the index is corrupted, but that
> xapian-check does not see it. Is there some more thorough test which could
> be run ?
You could try:
delve -t '' ./xapiandb
That will list the document lengths, so you can see if document 6 is in
that list or not.
More information about the Xapian-discuss