Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass
Sylvain Taverne
taverne.sylvain at gmail.com
Tue Jul 17 12:08:13 BST 2018
Hello,
The patch seems to fix the problem.
I've created a Dockerfile to test your patch:
https://github.com/staverne/xapian_test/blob/master/docker/Dockerfile
I didn't get the corruption errors anymore...
So good job !
Le mar. 10 juil. 2018 à 14:35, Sylvain Taverne <taverne.sylvain at gmail.com>
a écrit :
> Thank's !!
> I'll try during the week, and will let you know if all is fine ;)
>
> Le mar. 10 juil. 2018 à 07:36, Olly Betts <olly at survex.com> a écrit :
>
>> On Mon, Jul 09, 2018 at 10:29:18AM +0100, Olly Betts wrote:
>> > The attached patch reset this cursor each time commit() is called, and
>> > that fixes my C++ reproducer, though I think this ought to work as-is
>> > and the real bug is at a lower level.
>>
>> I've dug deeper and that was indeed the case. Here's a patch which
>> addresses the root cause:
>>
>> https://oligarchy.co.uk/xapian/patches/glass-cursor-rebuild-fix.patch
>>
>> For the curious, the bug was in some code to rebuild the cursor when the
>> underlying table changes in ways which require that. That's a fairly
>> rare occurrence (with my C++ reproducer it happens 99 times out of 5000
>> commits).
>>
>> In chert the equivalent code just marks the cursor's blocks as not yet
>> read, but in glass cursor blocks are reference counted and shared so we
>> can't simply do that as it could affect other cursors sharing the same
>> blocks.
>>
>> So instead the glass code was leaving them with the contents they
>> previously had, except for copying the current root block from the
>> table's "built-in cursor". After the rebuild we seek the cursor to the
>> same key it was on before, and that mostly works because we follow down
>> each level in the Btree from the new root, except it can happen that the
>> old cursor contained a block number which has since been released and
>> reallocated, and in that case the block doesn't get reread and we try to
>> use its old contents, which violates the rule that a parent can't be
>> younger than its child and causes the exception.
>>
>> The simplest fix is to just reset the rebuilt cursor to match the
>> current "built-in cursor" at all levels (not just the root), which is
>> cheap because of the reference counting. And that fixes my C++
>> reproducer, which I converted from your Python reproducer.
>>
>> Please test and let me know if this fixes the original problem or not.
>>
>> Cheers,
>> Olly
>>
>
More information about the Xapian-discuss
mailing list