DatabaseModifiedError while iterating on mset
Olly Betts
olly at survex.com
Wed Aug 30 03:26:44 BST 2023
On Mon, Aug 28, 2023 at 07:14:08AM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > If you only look at the terms and wdfs then you could only get
> > DatabaseModifiedError on the call to create the TermIterator since the
> > list of terms and wdfs is stored in a single entry per document which
> > is fetched when the iterator is created (it is conceivable this might
> > be different for a new database backend in the future I suppose).
>
> Oh wow. In Perl, I only had a retry_reopen wrapper only around
> the get_mset call to reopen the DB because documents get added
> frequently:
>
> my $mset = retry_reopen(sub { $enq->get_mset(0, 1000) });
> for my $m ($mset->items) {
> ...
> }
>
> But the above was actually unsafe from modifications and
> I should be doing the following?:
>
> my $mset = retry_reopen(sub { $enq->get_mset(0, 1000) });
> my $cur = retry_reopen(sub { $mset->begin });
> my $end = retry_reopen(sub { $mset->end });
> for (; $cur != $end; retry_reopen(sub { $cur++ })) {
> ...
> }
MSetIterator is really just a wrapper around an integer index
(plus a reference to an MSet) so there's definitely no database
access from operations like creating one or iterating it.
The part I was referring to is calling get_document() and then methods
on the returned Document object such as termlist_begin().
The various end iterator methods can't actually throw, as they return a
fixed value object which turns an iterator == end comparison into a NULL
pointer or integer 0 check (which is an implementation detail that's
pretty much nailed down by the alternative API we now expose via
`xapian/iterator.h`.)
> I suppose DocumentNotFound errors can also happen while
> iterating an MSet if a writer is deleting documents, too, right?
If you reopen the database but don't rerun get_mset() then you should
indeed get DocumentNotFound if a document in the MSet has since been
deleted.
I wonder how hard it would be to sort out reader locking so that we
could provide actual MVCC. It'd likely only be for platforms with
fcntl() locking for now (and is probably easier with F_OFD_SETLK which
is currently only supported by Linux but has been accepted by POSIX).
It might need to be opt-in via a flag on opening a Database object, but
that'd be a whole lot better than having to catch and handle
DatabaseModifiedError.
Cheers,
Olly
More information about the Xapian-discuss
mailing list