Xapian 1.4.3 "Db block overwritten - are there multiple writers?"

Olly Betts olly at survex.com
Sun May 21 22:54:24 BST 2017


On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote:
> I have a user reporting the following error during recoll indexing:
> 
>     flush() failed: Db block overwritten - are there multiple writers?
> 
> "flush() failed" is from recoll, the rest is, I think the text of the Xapian
> exception.
> 
> This is with Xapian 1.4.3 on Linux (I asked for more details, should be
> coming).
> 
> I don't think that I've ever seen this error, and I also don't think that
> there has been significant changes to recoll in this area, but as usual, I
> may be wrong.

What this means is that the database appears to have a child block which
is newer than its parent block (in the real world children are younger
than their parents, but in current Xapian DBs the reverse should be the
case - blocks are copied on write and the parent block points to its
children, so needs updating whenever any of its children are).

When reading a database, this is possible if a writer has updated that
part of the tree between reading the parent and reading the child (and
gives DatabaseModifiedError).

When writing, this shouldn't happen.

As the error suggests, if you manage to get multiple concurrent writers
this could happen.  There's locking which should prevent this, but that
can be defeated if the lock file is deleted (which people sometimes add
code to do, misunderstanding how the lock file is used - fcntl() locking
is used, and the lock file should always be present.).  

Assuming nobody deleted the log file, this could be a Xapian bug.  This
isn't something we're drowning in reports of, so presumably it doesn't
trigger easily, so finding a way to reproduce would be good.

It could also be memory or disk corruption.  We don't currently store
a checksum for each block, so there's no explicit detection of this.

Or something in the same process wrote to an fd that has since been
closed and reused for one of the database tables (Xapian avoids reusing
fds 0, 1 and 2 to avoid this for the standard streams, but it's hard to
fully protect against this given how fds work).

Or something else perhaps.

> I've asked the kind user to run xapian-check on the index and post the
> output.

That's a good thing to check.  If xapian-check finds no problems, then
it's presumably just an in-core issue, which points to a Xapian bug or
memory issues.

Cheers,
    Olly



More information about the Xapian-discuss mailing list