[Xapian-tickets] [Xapian] #645: Read block errors after reopen()
Xapian
nobody at xapian.org
Sat Jun 7 10:46:53 BST 2014
#645: Read block errors after reopen()
---------------------------+------------------
Reporter: medoc | Owner: olly
Type: defect | Status: new
Priority: low | Milestone:
Component: Other | Version:
Severity: normal | Keywords:
Blocked By: | Blocking:
Operating System: All |
---------------------------+------------------
\
\
Because of an ancient glitch (index flushes triggered by a query call),
the Recoll indexing process uses 2 separately opened Database objects
during indexing: one for updating the index, and the other one, readonly,
for querying (mostly up-to-date signature values).
The queries using the readonly Database object get DatabaseModified
exceptions, and call reopen() before retrying. This works well in general.
However, there are very rare cases where queries happening after the
reopen() get other Xapian exceptions, like:
Expected block 0 to be level 1, not 0
Error reading block xxx: got end of file
I have also seen the process stuck in an infinite loop somewhere in the
following call stack (probably near the bottom as I never get a shorter
stack with CTL-C / continue inside gdb).
{{{
#0 __memcmp_ssse3 () at ../sysdeps/x86_64/multiarch/memcmp-ssse3.S:40
#1 0x00007fd81e3efbe0 in Key::operator<(Key) const ()
from /usr/lib/libxapian.so.22
#2 0x00007fd81e3efca8 in ChertTable::find_in_block(unsigned char const*,
Key, bool, int) () from /usr/lib/libxapian.so.22
#3 0x00007fd81e3f0cc3 in ChertTable::find(Cursor*) const ()
from /usr/lib/libxapian.so.22
#4 0x00007fd81e3ccc69 in ChertCursor::find_entry(std::string const&) ()
from /usr/lib/libxapian.so.22
#5 0x00007fd81e3f7283 in ?? () from /usr/lib/libxapian.so.22
#6 0x00007fd81e3fb59b in ?? () from /usr/lib/libxapian.so.22
#7 0x00007fd81e3dc3fa in ?? () from /usr/lib/libxapian.so.22
#8 0x00007fd81e3538a6 in Xapian::Document::Internal::get_value(unsigned
int) const () from /usr/lib/libxapian.so.22
#9 0x00007fd81e35390c in Xapian::Document::get_value(unsigned int) const
()
from /usr/lib/libxapian.so.22
#10 0x00007fd81f2f6bcb in Rcl::Db::needUpdate (this=0x1cee5f0, udi=...,
sig=..., existed=existed at entry=0x7fff7edcddc8) at
../rcldb/rcldb.cpp:1762
...
}}}
This all happens while the recoll 1.19.13 indexer is running SINGLE-
THREADED, and I could reproduce it with Xapian 1.2.8 and 1.2.16
It happens that all known cases occurred on machines using SSDs, and it
seems that the problem is easier to reproduce with a relatively slow CPU.
I tried quite hard to reproduce the issue on a spinning disk system, with
no luck. This might indicate that timing is somehow relevant. Also all
cases were on Ubuntu, either 12.04 or 14.04
The original reporting user, who can reproduce the issue quite frequently,
uses a 2006 Macbook with ext4 on an SSD, and Ubuntu Trusty.
Changing the code so that the query db object is a copy of the update one
instead of being separately opened makes the problem disappear, and I'll
commit this change, as the reason for using two db objects has been gone
for many years.
It is quite possible that the Recoll code is incorrect again, I have no
simple program to reproduce the issue, and the single db object workaround
is actually an improvement of the code, so I am creating this report more
as a reference point than as a request for a fix.
\
\
\
--
Ticket URL: <http://trac.xapian.org/ticket/645>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list