Strange index consistency issue
Jean-Francois Dockes
jf at dockes.org
Fri Jan 8 07:11:48 GMT 2016
Hi,
A Recoll user is reporting an index corruption problem. In general, index
corruption happens from time to time with Recoll, because of crashes,
reboots, misc Recoll bugs, etc.
The strange thing here is that xapian-check does not seem to detect anything.
In a nutshell, some document numbers seem to point to a data blackhole: the
docids are returned when searching for the file/doc unique identifying
term, but then get_document() fails. A later replace_document() succeeds,
but on the next indexing pass, same issue.
// success
docid = db.postlist_begin(uniterm)
// then failure:
xdoc = db.get_document(*docid)
In this situation, Recoll will try to update the doc. replace_document()
then succeeds, and this repeats on the next indexing pass.
This is with Xapian 1.2.16
Here follows a slightly edited version of what the user reports about
experiments run with pure xapian-check/delve:
Recoll 1.21.3 + Xapian 1.2.16 with two external indices (on a network
server) and one local index. The setup has been running fine for weeks and
the external indices update on a cron job overnight.
Today, I searched for a term that I know is in many documents and can be
found (my last name). No documents were found in the gui Recoll.
I then searched in one external index on the command line
"recoll -c -t -q term" and received the following response:
:2:../rcldb/rclquery.cpp:358:xenquire->get_mset: exception: Document 6 not
found Recoll query: ((term...)) -1 results
:2:../rcldb/rclquery.cpp:392:enquire->get_mset: exception: Document 6 not
found
I then went through and checked as above (after installing xapian-tools). I
ran the xapian-check on both external indices and both had no problems.
I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of
which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of
terms, which included 'term' and seemed to be reasonable for a document I
then ran "delve -r 6 ./xapiandb -d" and got the following:
Data for record #6:
Error: DocNotFoundError: Document 6 not found
And the output from xapian-check:
============
record:
baseB blocksize=8K items=84507 lastblock=3379 revision=157 levels=2 root=12
B-tree checked okay
record table structure checked OK
termlist:
baseB blocksize=8K items=169014 lastblock=24090 revision=157 levels=2 root=5
B-tree checked okay
termlist table structure checked OK
postlist:
baseB blocksize=8K items=8727966 lastblock=66596 revision=157 levels=3 root=113
B-tree checked okay
postlist table structure checked OK
position:
baseB blocksize=8K items=34905667 lastblock=109114 revision=157 levels=2 root=11167
B-tree checked okay
position table structure checked OK
spelling:
Lazily created, and not yet used.
synonym:
baseB blocksize=8K items=255128 lastblock=4844 revision=157 levels=2 root=2
B-tree checked okay
synonym table: Don't know how to check structure
No errors found
=============
The whole report is here:
https://bitbucket.org/medoc/recoll/issues/257/query-returns-no-results-when-document-is
Look for the 'Bob Cargill' section, unfortunately, the issue was appended
to an older one (corruption too, but detected by xapian-check, so nothing
extraordinary there).
To repeat, the issue here is not that the index is corrupted, but that
xapian-check does not see it. Is there some more thorough test which could
be run ?
Cheers,
J.F. Dockes
More information about the Xapian-discuss
mailing list