[Xapian-tickets] [Xapian] #645: Read block errors after reopen()

Sat Jun 7 10:46:53 BST 2014

#645: Read block errors after reopen()
---------------------------+------------------
        Reporter:  medoc   |      Owner:  olly
            Type:  defect  |     Status:  new
        Priority:  low     |  Milestone:
       Component:  Other   |    Version:
        Severity:  normal  |   Keywords:
      Blocked By:          |   Blocking:
Operating System:  All     |
---------------------------+------------------
\
\
 Because of an ancient glitch (index flushes triggered by a query call),
 the Recoll indexing process uses 2 separately opened Database objects
 during indexing: one for updating the index, and the other one, readonly,
 for querying (mostly up-to-date signature values).

 The queries using the readonly Database object get DatabaseModified
 exceptions, and call reopen() before retrying. This works well in general.

 However, there are very rare cases where queries happening after the
 reopen() get other Xapian exceptions, like:

 Expected block 0 to be level 1, not 0

 Error reading block xxx: got end of file

 I have also seen the process stuck in an infinite loop somewhere in the
 following call stack (probably near the bottom as I never get a shorter
 stack with CTL-C / continue inside gdb).
 {{{
 #0  __memcmp_ssse3 () at ../sysdeps/x86_64/multiarch/memcmp-ssse3.S:40
 #1  0x00007fd81e3efbe0 in Key::operator<(Key) const ()
    from /usr/lib/libxapian.so.22
 #2  0x00007fd81e3efca8 in ChertTable::find_in_block(unsigned char const*,
 Key, bool, int) () from /usr/lib/libxapian.so.22
 #3  0x00007fd81e3f0cc3 in ChertTable::find(Cursor*) const ()
    from /usr/lib/libxapian.so.22
 #4  0x00007fd81e3ccc69 in ChertCursor::find_entry(std::string const&) ()
    from /usr/lib/libxapian.so.22
 #5  0x00007fd81e3f7283 in ?? () from /usr/lib/libxapian.so.22
 #6  0x00007fd81e3fb59b in ?? () from /usr/lib/libxapian.so.22
 #7  0x00007fd81e3dc3fa in ?? () from /usr/lib/libxapian.so.22
 #8  0x00007fd81e3538a6 in Xapian::Document::Internal::get_value(unsigned
 int) const () from /usr/lib/libxapian.so.22
 #9  0x00007fd81e35390c in Xapian::Document::get_value(unsigned int) const
 ()
    from /usr/lib/libxapian.so.22
 #10 0x00007fd81f2f6bcb in Rcl::Db::needUpdate (this=0x1cee5f0, udi=...,
     sig=..., existed=existed at entry=0x7fff7edcddc8) at
 ../rcldb/rcldb.cpp:1762
 ...
 }}}

 This all happens while the recoll 1.19.13 indexer is running SINGLE-
 THREADED, and I could reproduce it with Xapian 1.2.8 and 1.2.16

 It happens that all known cases occurred on machines using SSDs, and it
 seems that the problem is easier to reproduce with a relatively slow CPU.
 I tried quite hard to reproduce the issue on a spinning disk system, with
 no luck. This might indicate that timing is somehow relevant. Also all
 cases were on Ubuntu, either 12.04 or 14.04

 The original reporting user, who can reproduce the issue quite frequently,
 uses a 2006 Macbook with ext4 on an SSD, and Ubuntu Trusty.

 Changing the code so that the query db object is a copy of the update one
 instead of being separately opened makes the problem disappear, and I'll
 commit this change, as the reason for using two db objects has been gone
 for many years.

 It is quite possible that the Recoll code is incorrect again, I have no
 simple program to reproduce the issue, and the single db object workaround
 is actually an improvement of the code, so I am creating this report more
 as a reference point than as a request for a fix.
\
\
\

--
Ticket URL: <http://trac.xapian.org/ticket/645>
Xapian <http://xapian.org/>
Xapian