Strange index consistency issue

Jean-Francois Dockes jf at dockes.org
Fri Jan 8 07:11:48 GMT 2016


Hi,

A Recoll user is reporting an index corruption problem. In general, index
corruption happens from time to time with Recoll, because of crashes,
reboots, misc Recoll bugs, etc.

The strange thing here is that xapian-check does not seem to detect anything.

In a nutshell, some document numbers seem to point to a data blackhole: the
docids are returned when searching for the file/doc unique identifying
term, but then get_document() fails. A later replace_document() succeeds,
but on the next indexing pass, same issue.

          // success
          docid = db.postlist_begin(uniterm)
          // then failure:
          xdoc = db.get_document(*docid)

In this situation, Recoll will try to update the doc. replace_document()
then succeeds, and this repeats on the next indexing pass.

This is with Xapian 1.2.16

Here follows a slightly edited version of what the user reports about
experiments run with pure xapian-check/delve:

    Recoll 1.21.3 + Xapian 1.2.16 with two external indices (on a network
    server) and one local index. The setup has been running fine for weeks and
    the external indices update on a cron job overnight. 

    Today, I searched for a term that I know is in many documents and can be
    found (my last name). No documents were found in the gui Recoll. 

    I then searched in one external index on the command line
    "recoll -c -t -q term" and received the following response: 

    :2:../rcldb/rclquery.cpp:358:xenquire->get_mset: exception: Document 6 not
       found Recoll query: ((term...)) -1 results
    :2:../rcldb/rclquery.cpp:392:enquire->get_mset: exception: Document 6 not
       found 

    I then went through and checked as above (after installing xapian-tools). I
    ran the xapian-check on both external indices and both had no problems.

    I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of
    which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of
    terms, which included 'term' and seemed to be reasonable for a document I
    then ran "delve -r 6 ./xapiandb -d" and got the following:

    Data for record #6:

    Error: DocNotFoundError: Document 6 not found

And the output from xapian-check:

============
record:
baseB blocksize=8K items=84507 lastblock=3379 revision=157 levels=2 root=12
B-tree checked okay
record table structure checked OK

termlist:
baseB blocksize=8K items=169014 lastblock=24090 revision=157 levels=2 root=5
B-tree checked okay
termlist table structure checked OK

postlist:
baseB blocksize=8K items=8727966 lastblock=66596 revision=157 levels=3 root=113
B-tree checked okay
postlist table structure checked OK

position:
baseB blocksize=8K items=34905667 lastblock=109114 revision=157 levels=2 root=11167
B-tree checked okay
position table structure checked OK

spelling:
Lazily created, and not yet used.

synonym:
baseB blocksize=8K items=255128 lastblock=4844 revision=157 levels=2 root=2
B-tree checked okay
synonym table: Don't know how to check structure

No errors found
=============


The whole report is here:
https://bitbucket.org/medoc/recoll/issues/257/query-returns-no-results-when-document-is

Look for the 'Bob Cargill' section, unfortunately, the issue was appended
to an older one (corruption too, but detected by xapian-check, so nothing
extraordinary there).

To repeat, the issue here is not that the index is corrupted, but that
xapian-check does not see it. Is there some more thorough test which could
be run ?

Cheers,

J.F. Dockes



More information about the Xapian-discuss mailing list