[Xapian-tickets] [Xapian] #317: Database corruption after disk-full error

Thu Dec 18 03:35:50 GMT 2008

#317: Database corruption after disk-full error
---------------------------+------------------------------------------------
 Reporter:  richard        |        Owner:  richard
     Type:  defect         |       Status:  new    
 Priority:  normal         |    Milestone:  1.0.10 
Component:  Backend-Flint  |      Version:  1.0.7  
 Severity:  normal         |   Resolution:         
 Keywords:                 |    Blockedby:         
 Platform:  All            |     Blocking:         
---------------------------+------------------------------------------------

Old description:

> I've recently been testing the behaviour of xapian when the disk becomes
> full, after reports of corruption at a customer site in this situation,
> by performing some indexing to a database in a small partition.
>
> The key seems to be that, if a WritableDatabase is re-used after an
> operation with it has encountered an IOError, all sorts of corruption is
> possible.  I've got a python script which repeatably produces a corrupt
> database when run in a small partiton, which I'll attach here shortly.
> However, the exact mode of failure is very sensitive to the initial
> amount of space available.
>
> I've only tested this with the flint backend so far, and only with xapian
> 1.0.7 (the version in ubuntu hardy) but it's likely that chert and more
> recent xapian's have a similar problem.

New description:

 I've recently been testing the behaviour of xapian when the disk becomes
 full, after reports of corruption at a customer site in this situation, by
 performing some indexing to a database in a small partition.

 The key seems to be that, if a !WritableDatabase is re-used after an
 operation with it has encountered an IOError, all sorts of corruption is
 possible.  I've got a python script which repeatably produces a corrupt
 database when run in a small partiton, which I'll attach here shortly.
 However, the exact mode of failure is very sensitive to the initial amount
 of space available.

 I've only tested this with the flint backend so far, and only with xapian
 1.0.7 (the version in ubuntu hardy) but it's likely that chert and more
 recent xapian's have a similar problem.

--

Comment(by olly):

 Hmm, what is actually the first system call which fails due to the full
 disk in this scenario?

 Judging from the patch, it seems it is cancel() which must be throwing,
 but that only seems to read from disk so I'm not sure why it would fail in
 this situation...

 The fix does indeed look promising, though I'm not totally sure it's
 exactly the right (or at least best) approach (perhaps we should be more
 fine-grained about where the exception happened), and also it seems that
 if we throw in the new code, we're probably no better off...

-- 
Ticket URL: <http://trac.xapian.org/ticket/317#comment:3>
Xapian <http://xapian.org/>
Xapian