[Xapian-tickets] [Xapian] #434: reopen should re-read stub database file
Xapian
nobody at xapian.org
Mon Feb 1 20:28:15 GMT 2010
#434: reopen should re-read stub database file
-------------------------+--------------------------------------------------
Reporter: richard | Owner: richard
Type: defect | Status: new
Priority: normal | Milestone: 1.1.7
Component: Library API | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Currently, when a database is opened via a stub db file (or directory
containing a XAPIANDB stub file), calling reopen() on that database does
not re-read the stub db file, it just reopens the databases which were
listed in that file at the time of the first open.
This is probably desirable in some cases - if one of the databases listed
in the stub db file is an inmemory database, for example. It's also
somewhat helpful if the stub db file is not expected to change, since this
avoids unwanted overhead in the call to reopen().
However, it's definitely undesirable in the case of a database produced by
replication: the replication client creates a directory containing a
XAPIANDB file which either points to a subdirectory called replica_0 or
replica_1 - the subdirectory in use is "toggled" when a full database copy
from the remote server is performed. Users of replication don't know
about this detail, so would reasonably expect reopen() to simply given
them the latest copy of the database retrieved by the replication client.
Instead, if a single copy has occurred since opening, reopen() will return
a !DatabaseOpeningError, complaining that the database files are missing,
and if a second copy has occurred, the reopen() will usually appear to
succeed, but will later complain that the database is corrupt, since the
database has been replaced by a completely new database.
My feeling is that reopen() is producing "suprising" behaviour here, from
most users point of view, and that we should change it to re-read the stub
database file. After all, one way to open a stub database is to pass its
path to the Xapian::Database constructor - it behaves like a database, so
calling reopen() ought to reopen the stub file and re-read it.
It might be nice to retain the existing behaviour for reopen() to allow
users who fully understand the situation to avoid the overhead of re-
reading the stub file. I'm not fully convinced the performance gain here
outweights the benefit of a less confusing behaviour, but if we want to do
that, one approach would be to have an optional parameter added to
reopen() which avoids re-reading stub databases if set. Alternatively,
there could be a flag set in a stub database somehow to indicate that the
stub database shouldn't be re-read on reopen. I don't think this should
be set by default, though.
My belief that the current behaviour is surprising is partly because I've
been confused by it. I think that the cause of the confusion is twofold.
Partly, the name of "reopen" implies (to me, anyway) that the result
should be the same as closing the database and opening it again. For a
replicated database, the stub file is "hidden", so the user will be
surprised by this behaviour; but I was surprised by this behaviour even
knowing about the internal details of stub databases. If the method was
called "move_to_latest_revision" it might be easier to explain, (but only
if users also understood the difference between a replication step which
applied changesets, and one which performed a full database copy).
Marking this ticket for 1.1.7, since we should fix it or document a
workaround before 1.2: it is relevant to the replication functionality.
The easy workaround is "don't use reopen: just open replicated databases
from scratch each time you need to update them".
--
Ticket URL: <http://trac.xapian.org/ticket/434>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list