Strange index consistency issue
Jean-Francois Dockes
jf at dockes.org
Fri Jan 15 07:48:30 GMT 2016
Olly Betts writes:
> On Thu, Jan 14, 2016 at 11:04:29AM +0100, Jean-Francois Dockes wrote:
> > Olly Betts writes:
> > > On Sun, Jan 10, 2016 at 02:53:14AM +0000, Bob Cargill wrote:
> > > > I will look into the bug you listed to see if it might be
> > > > related. If there is anything else that I can do, please let me
> > > > know.
> > >
> > > If that bug is not the cause, it would be good to get to the bottom
> > > of this - the database shouldn't become corrupt like this.
> >
> > I remembered something: I could only reproduce issue #645 with
> > separate read/write database objects, but this one is with recoll
> > 1.21, which uses a single object, so maybe a different problem.
>
> The underlying bug for #645 was that cursors weren't getting rebuilt in
> some situations where they needed to be, and could end up with bad data
> in, and that bad data could be stale data. So it's plausible a write
> might go to the wrong block, which could explain "lost" data like we
> have here.
>
> It could easily be a different problem, but testing with the latest
> 1.2.x would be useful to make sure we aren't trying to track down a bug
> we've already fixed.
>
> > While a Xapian bug might be involved, there are many reasons why a
> > Recoll indexer can meet an abrupt end in the general case (not saying
> > this is the case here).
> > A pulled power cord would be the most radical example. Recoll usually
> > does not run in a datacenter...
> >
> > In most cases, the data is replaceable without too much effort, so
> > that reliable detection of an issue is almost as good as assurance
> > that it won't occur. The latter seems very difficult to attain when
> > running in an uncontrolled environment.
>
> It may not matter for recoll, but more generally we don't want Xapian
> databases getting corrupt. And we do aim to survive power failures,
> kernel panics, etc - achieving that in all cases is rather hard, but I
> don't think that's a reason to drop it as an aim.
It was not my intention to suggest this.
As an aside, it *does* matter for Recoll that its index would survive
events. A few Recoll users have gigantic indexes (hopefully in sane
environments), needing multiple days to rebuild.
Being oldish and having spent 30 years around data management issues, I
just happen to believe that datacenter RDBMS-type reliability is *not
possible* for the typical Recoll installation, on a random machine, with an
arbitrary filesystem and IO subsystem (hasn't there been a few issues
around Linux fs data post-crash consistency?).
This is why I believe that, faced with uncertain reliability, and equipped
with backed-up data, corruption detection is a very important feature, even
if it can't be completely reliable either.
> Examples of corruption that can be reproduced (even if it's not entirely
> on demand) are very useful - if you can see the corruption happen it's
> a lot easier to work out what is going wrong than if you just see the
> aftermath.
And I do intend to provide such examples whenever possible. I was just
trying to make it clear that I was not necessarily looking for a fault in
Xapian code.
> > There is one weird thing though, which is why, in this situation,
> > replace_document() appears to repeatedly accepts data which goes into a
> > black hole.
>
> Are you replacing the document with the same data?
Bob answered this, yes, mystery solved.
Cheers,
jf
> If so, I think what happens is that it looks in the termlist table to
> see if the document exists. It does, so it compares the terms and sees
> they are the same, and decides there's nothing to do.
>
> It never looks at the document length list, so doesn't see that is
> damaged.
>
> Or if it's different data, but with the same "document length" (i.e.
> sum(wdf)) then it'll update the termlist, but spot the length hasn't
> changes so again not bother to look at the document length list.
>
> If you replaced the document with a modified version with a different
> length, I'd expect this would actually "self-heal".
>
> Cheers,
> Olly
>
More information about the Xapian-discuss
mailing list