Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Jean-Francois Dockes
jf at dockes.org
Mon May 22 06:45:59 BST 2017
Olly Betts writes:
> On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote:
> > I have a user reporting the following error during recoll indexing:
> >
> > flush() failed: Db block overwritten - are there multiple writers?
> >
> > "flush() failed" is from recoll, the rest is, I think the text of the Xapian
> > exception.
> >
> > This is with Xapian 1.4.3 on Linux (I asked for more details, should be
> > coming).
> >
> > I don't think that I've ever seen this error, and I also don't think that
> > there has been significant changes to recoll in this area, but as usual, I
> > may be wrong.
>
> What this means is that the database appears to have a child block which
> is newer than its parent block (in the real world children are younger
> than their parents, but in current Xapian DBs the reverse should be the
> case - blocks are copied on write and the parent block points to its
> children, so needs updating whenever any of its children are).
>
> When reading a database, this is possible if a writer has updated that
> part of the tree between reading the parent and reading the child (and
> gives DatabaseModifiedError).
>
> When writing, this shouldn't happen.
>
> As the error suggests, if you manage to get multiple concurrent writers
> this could happen. There's locking which should prevent this, but that
> can be defeated if the lock file is deleted (which people sometimes add
> code to do, misunderstanding how the lock file is used - fcntl() locking
> is used, and the lock file should always be present.).
I don't think that there is code in Recoll doing this. Recoll also has its
own protection against multiple writer processes, and in the normal
configuration, a single thread uses the WritableDatabase. It's also
possible to set things up for multiple writing threads though (with lock
protection in this case). I've asked the user to confirm the thread
configuration.
> Assuming nobody deleted the log file, this could be a Xapian bug. This
> isn't something we're drowning in reports of, so presumably it doesn't
> trigger easily, so finding a way to reproduce would be good.
>
> It could also be memory or disk corruption. We don't currently store
> a checksum for each block, so there's no explicit detection of this.
>
> Or something in the same process wrote to an fd that has since been
> closed and reused for one of the database tables (Xapian avoids reusing
> fds 0, 1 and 2 to avoid this for the standard streams, but it's hard to
> fully protect against this given how fds work).
This is certainly a possibility of course. In this case, we might be able
to get an idea by looking at the actual data (with luck). What would be the
best approach to get a peek ?
> Or something else perhaps.
>
> > I've asked the kind user to run xapian-check on the index and post the
> > output.
>
> That's a good thing to check. If xapian-check finds no problems, then
> it's presumably just an in-core issue, which points to a Xapian bug or
> memory issues.
The output of xapian-check follows.
Best regards,
Jf
xapian-check ~/.recoll/xapiandb
record:
baseB blocksize=8K items=943378 lastblock=85955 revision=6207 levels=2 root=18014
B-tree checked okay
record table structure checked OK
termlist:
baseB blocksize=8K items=1886756 lastblock=417475 revision=6207 levels=3 root=83720
B-tree checked okay
termlist table structure checked OK
postlist:
baseB blocksize=8K items=8872525 lastblock=524452 revision=6207 levels=3 root=238
B-tree checked okay
termfreq 197211 != # of entries 197210
collfreq 10861536 != sum wdf 10861533
termfreq 14189 != # of entries 14188
collfreq 98354 != sum wdf 98344
termfreq 9866 != # of entries 9865
collfreq 56453 != sum wdf 56443
termfreq 195141 != # of entries 195137
collfreq 8126093 != sum wdf 8126079
postlist table errors found: 8
position:
baseB blocksize=8K items=180902610 lastblock=1701333 revision=6207 levels=3 root=48617
B-tree checked okay
position table structure checked OK
spelling:
Lazily created, and not yet used.
synonym:
baseB blocksize=8K items=1369690 lastblock=32050 revision=6207 levels=2 root=2
B-tree checked okay
synonym table: Don't know how to check structure
Total errors found: 8
More information about the Xapian-discuss
mailing list