automatic reopen + retry on DatabaseModifiedError

Mon Dec 12 21:57:39 GMT 2016

On Sat, Dec 10, 2016 at 11:42:49PM +0000, Eric Wong wrote:
> Hello, is there a reason Xapian does not have any way to
> automatically reopen + retry on DatabaseModifiedError
> exceptions? 

Because you probably want to restart the whole operation, not just the
one method.  If we just update to the latest revision and retry, you
can get inconsistent results - e.g. a document in the search results
from an older revision could have been deleted in the latest revision.
So DatabaseModifiedError on get_document() probably wants to cause
get_mset() to be rerun, which just isn't feasible to achieve inside
the library.

>  (It would even be preferential to return stale data...)

If you get DatabaseModifiedError, there's no suitable data from the
revision the reader was working on available to return, nor from
any older revision, at least with glass as it reuses blocks freed
longest ago first.  Chert reused blocks nearest the start of the file
first, which should mean DatabaseModifiedError is less likely with
glass.

The longer term plan is that we'll keep revisions that are in use by
readers around (at least optionally), but 1.4.x was already late
enough without holding it for anything else.

It would probably be fairly easy to patch glass to optionally keep
a buffer of one previous revision in the freelist, which could be
slotted into 1.4.x with a new DB_* option for WritableDatabase's
constructor.  That way readers would be safe for one commit from
the writer (at the expensive of the database being larger, which
it also true of the longer-term reader locking plan), so if you
take care not to commit more often than the longest time a search
can sanely take you shouldn't get DatabaseModifiedError.

If anyone fancies taking a look, backends/glass/glass_freelist.cc
(and .h) is the place, and fl_end is what currently determines the
most recent freelist entry to release.

> I'm dealing with Search::Xapian::DatabaseModifiedError exceptions
> in public-inbox[1] and going through the code to make sure there's
> no more places it could fire.  But at this point, it seems like
> whack-a-mole and I'm not sure if I've found everything.

Basically anything potentially can.  In practice some things only
work off cached data currently, but those are really implementation
details.  So retrying by catching the exception at a higher level
rather than per call is simpler from that point of view too.  The
main issue is it means you need to buffer all the output up rather
than just sending it as you go.

Cheers,
    Olly