Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass

Olly Betts olly at survex.com
Tue Jul 10 06:36:28 BST 2018


On Mon, Jul 09, 2018 at 10:29:18AM +0100, Olly Betts wrote:
> The attached patch reset this cursor each time commit() is called, and
> that fixes my C++ reproducer, though I think this ought to work as-is
> and the real bug is at a lower level.

I've dug deeper and that was indeed the case.  Here's a patch which
addresses the root cause:

https://oligarchy.co.uk/xapian/patches/glass-cursor-rebuild-fix.patch

For the curious, the bug was in some code to rebuild the cursor when the
underlying table changes in ways which require that.  That's a fairly
rare occurrence (with my C++ reproducer it happens 99 times out of 5000
commits).

In chert the equivalent code just marks the cursor's blocks as not yet
read, but in glass cursor blocks are reference counted and shared so we
can't simply do that as it could affect other cursors sharing the same
blocks.

So instead the glass code was leaving them with the contents they
previously had, except for copying the current root block from the
table's "built-in cursor".  After the rebuild we seek the cursor to the
same key it was on before, and that mostly works because we follow down
each level in the Btree from the new root, except it can happen that the
old cursor contained a block number which has since been released and
reallocated, and in that case the block doesn't get reread and we try to
use its old contents, which violates the rule that a parent can't be
younger than its child and causes the exception.

The simplest fix is to just reset the rebuilt cursor to match the
current "built-in cursor" at all levels (not just the root), which is
cheap because of the reference counting.  And that fixes my C++
reproducer, which I converted from your Python reproducer.

Please test and let me know if this fixes the original problem or not.

Cheers,
    Olly



More information about the Xapian-discuss mailing list