[Xapian-devel] buffered tables, sessions, and transactions

Richard Boulton richard at lemurconsulting.com
Wed May 19 17:31:04 BST 2004


Olly Betts wrote:
> I've noticed a slight wrinkle - currently if a database is destroyed
> with an active transaction, cancel_transaction is called.  Without
> this method, if an error is thrown part way through a transaction,
> either we apply the whole lot, or lose everything since the last call to
> flush() (explicit or implicit).  That could be several transactions, so
> at least this preserves atomicity better than just flushing.

I think the possible situations are:

1) Users don't require any guarantees other than that the database
    remains in a consistent state, since they can easily replay all their
    data if an error occurs (and use a unique ID to check if each item
    was already indexed).

2) Users want to be able to call something ("flush()") to ensure that
    changes up to a certain point are not lost from the database if an
    error occurs in future.

3) Users want to be able to ensure that a group of modifications (eg, an
    insert and delete pair, or something more complex) are atomically
    added to the database.  If an error occurs part way through the
    group, the entire group must be discarded (possibly along with other
    modifications, if a flush() wasn't called before entering the group).
    Once the group is complete, it may still get discarded if a error
    occurs before flush() has been called.

4) As (3), but users want to be able to change their mind part-way
    through a group of modifications (perhaps due to an error outside
    Xapian) and cancel the whole group.


It seems to me that the use of the word "transaction" has various 
connotations which we don't necessarily wish to implement. In 
particular, to say that a transaction is complete sounds to me as if the 
transaction should have been written to disk.

How about removing the transaction methods and implementing:
  begin_group()  - begins a group of modifications (no flush before
                   start).
  end_group()    - ends a group of modifications, but doesn't flush.
  cancel_group() - cancel a group of modifications (and anything else
                   which hasn't been flushed (by flush() or autoflush()).

Autoflush will never happen during a group, and an explicit flush() 
called during a group will report an error.

Does this make sense, or have I missed something?  I think this would be 
simple to implement (but am a little out of touch with the relevant part 
of the code, so may be missing a problem).

> Perhaps we should *always* require a call to flush() (or a new close()
> method) before a database is destroyed?  At present, any errors thrown
> by the implicit flush() in the destructor are caught and ignored, which
> isn't ideal at all.

I don't like this idea.  Better would be to recommend that a flush() is
called before destroying a database - but if one hasn't been then call
flush in the destructor and ignore errors.  If users don't care (eg, 
situation 1, above), they don't need to flush(), but if they do care 
they will call flush and receive error reports.

-- 
Richard




More information about the Xapian-devel mailing list