[Xapian-devel] UUIDs for databases

Sun Aug 3 07:40:03 BST 2008

On Sun, Aug 03, 2008 at 01:10:07AM +0100, Richard Boulton wrote:
> UUIDs aren't currently externally exposed (other than as part of an 
> opaque blob of data specifying a particular database/revision 
> combination), so we can probably just focus on providing UUIDs which are 
> good enough to allow the replication process to function robustly, 
> rather than thinking about any other uses which they may be put to.

I think they would be useful to expose though, so only thinking about
what replication needs isn't helpful either.

> My thinking so far is that:
> 
>   - A new UUID should be created when a database is created from scratch.
> 
>   - A new UUID should be created when a database is replaced (eg, with 
> one of the OVERWRITE options of the open method).
> 
>   - If a database is modified in any way (which will always involve a 
> transaction occurring), the UUID should be left unchanged.  This allows 
> a UUID/revision_number combination to identify a particular revision of 
> a database.
> 
>   - If a database is replicated, the replica should have the same UUID 
> as the original (because, to all intents and purposes, it _is_ the 
> original).

OK so far.

>   - If the replica of the database is then modified (eg, by adding a 
> document to it), the UUID of the replica should be changed.  This is 
> because, otherwise, after the next change on the source database, there 
> will be two different databases in existence with the same UUID/revision 
> combination, but different contents.  This would be bad because it would 
> a future replication from the source database to the modified database 
> wouldn't notice the differences, and would be likely to apply an 
> incompatible set of changes, leading to a corrupt database.

I'm not sure I agree.  I can see the reasoning, but you can clone a
database with rsync or cp and we aren't going to magically be able to
change the UUID of one copy then.  Similarly, you can rollback a
database to a backup copy taken at an older revision, and then the
UUID+revision_number combination isn't unique either.  If we only
"solve" this for replicating to updated replicas, then we've really
only handled one case, and in the process warped the UUID semantics
for other potential users (and added quite a lot of complication
to the code from the sound of it).

It seems that either you need a scheme which makes replication work
robustly in the face of such situations, or this is just going to be a
problem that needs to be taken into account when designing systems.

Similarly, you can replicate a database using rsync, but not while it is
being updated, and you can't reliably search the copy while it is being
updated.  Those are just limitations you have to handle if you replicate
using rsync.

>   - In normal usage, we'd never expect a replicated database to receive 
> any updates, so this is unlikely to be a problem.  However, one idea 
> I've pondered for the future, for redundancy purposes, is to allow a 
> pool of replicated databases to be built, in which there would be one 
> master database.  Updates would be sent to all clients, but only the 
> master would normally index them; all the other databases would 
> replicate against the master.  If the master died, an election would be 
> held to elect a new master, and that master would apply the updates to 
> its databases, and serve the clients with replications.  If the original 
> master then came back online, and the UUID in the new master with 
> updates was the same as in the old master with its (different) updates, 
> it would be impossible for clients to distinguish the two, again 
> potentially leading to attempts to combine incompatible changesets, and 
> eventually to broken databases.  (I've no intention of writing the code 
> for such a pooling system at present, but I'd like not to preclude it, 
> either.)

For this, you could just change the UUID of the new master at the
investiture ceremony following the election.  You'd probably want to
clean up the new master's files anyway.

>   - The replication protocol currently works by a client sending a 
> message to the server saying "please give me the necessary updates to 
> transform a database with a given UUID and a given revision number into 
> a more recent version of the database".  Therefore, for efficient 
> updates, the server needs to remember that the old revision number 
> applied for databases before a particular revision number.

Do you mean "the old UUID" not "old revision number"?

> I think the upshot of this is that:
> 
>   1. A better UUID generation mechanism would be good, to reduce chance 
> of random conflicts between databases.

Yes, we should just use one of the standard UUID forms.  We need to have
a plan for flint though, as existing flint databases won't have one.
We could perhaps create one if such a database is replicated.  Or just
the first time that get_uuid() is called on it if it's a public method.

>   2. Replicas need to keep track of a flag saying that they're a replica 
> of a database, not the original database, so that they know that they 
> need to pick a new UUID if modifications are made to them (other than 
> via the replication protocol).
>
>   3. Databases need to keep track of the UUID which they had at any of 
> the revisions for which they have changesets available.  Perhaps the 
> easiest way to implement this would be to store the UUID of the database 
> in the changeset files; the information is then available to be checked.
> 
> (1) is important before the replication code is put into a release, I 
> think, to reduce the chance of conflicts, which would lead to corrupt 
> databases
> 
> (2) would also be good to fix before a release, since otherwise 
> perfectly valid code could result in a corrupt database.  (Note; until 
> recently, it wasn't possible (without hackery) to modify a replicated 
> database, because they are accessed by stub databases.  However, Olly 
> has recently made it valid to open stub databases for writing, as long 
> as they contain exactly one sub-database).

I think the particular case you highlight is better handled by simply
recording the latest replicated revision in the replica directory.  Then
you can tell if the replica has been modified locally and do something
appropriate (such as throw an error or perform a full copy of the
master).

I'm not sure I'd regard opening the Xapian database inside the replica
database as "invalid code" FWIW.

> (3) is less important, because it would simply result in full-database 
> copies occurring when changesets could be applied, leading to an 
> efficiency drop, rather than corrupt databases.  However, it would be 
> best to fix the changeset files to hold the information required before 
> a release, even if we don't actually write the code to search through 
> them if the database UUID doesn't match, so that incompatible changes 
> aren't required to implement this in future.

I think this is only relevant if you're changing the UUID?

Cheers,
    Olly