[Xapian-discuss] DatabaseModifiedError

Michel Pelletier pelletier.michel at gmail.com
Wed Mar 23 15:06:18 GMT 2011


On Wed, Mar 23, 2011 at 4:02 AM, Andrew Betts <andrew.betts at assanka.net> wrote:
>
> On 23 Mar 2011, at 10:35, Richard Boulton wrote:
>
>> On 23 March 2011 10:16, Andrew Betts <andrew.betts at assanka.net> wrote:
>>> Hi,
>>>
>>> I am starting to see more frequent occurrences of a DatabaseModifiedError when users execute searches on my site.  Since our index is being updated all the time, its easy to understand why these happen, but is it possible to ignore this error and execute the search against the stale revision?  It's likely only a few seconds out of date, and if I follow the advice and reopen() the database, it's quite possible that it will have been modified again before I manage to call get_mset.
>>
>> Xapian can't be told to ignore the error and use the stale revision,
>> because the error is only thrown when the data Xapian needs from the
>> old revision has already been overwritten by new data.  If you're
>> searching on version N, you won't get the error until before version
>> N+1 has been committed.  After version N+1 has been committed, changes
>> to the database (ie, leading to version N+2) will start overwriting
>> blocks which were used in version N (but not in version N+1), and
>> there is then a risk of the exception being thrown.
>>
>> Assuming you're updating the index every few seconds, and searches
>> take no more than a couple of seconds in the worst case you could
>> perhaps insert a pause after committing a change before performing any
>> further modifications to the database, to allow readers a chance to
>> reopen on the new revision before risking overwriting blocks from the
>> old revision.
>
> Hmm.  The problem is that, to give us high availability, the process works like this:
>
> 1. Editor clicks save on a post.
> 2. Indexing job is posted to a RabbitMQ message queue
> 3. Two instances of a search indexer daemon, running on different servers, are consuming the queue.  One of them will pick up the job
> 4. The daemon opens the Xapian db from a distributed shared filesystem, makes the change and commits it.

This is very similar to our architecture, editor clicks a post, rabbit
message is transmitted, but we only have one writer (it's a dirt
simple process as it is and is highly available due to it's
simplicity, it's not redundant but we monitor it and it hasn't been a
problem).  I don't see how that changes the proposed solution thought,
your readers should do their searches in the exception catching way of
your language (try...except DatabaseError in the case of python) and
retry the search up to N times.  We use a slight backoff algorithm as
well.  Here's the logic we use:

    def retry_if_modified(self, operation, limit=RETRY_LIMIT, refresh=True):
        tries = 0
        while True:
            try:
                return operation()
            except xapian.DatabaseError, e:
                if tries >= limit:
                    logger.warning('%s after %s retries, failing.', e, tries)
                    raise
                logger.info('%s: after %s retries, retrying', e, tries)
                time.sleep(tries * .1)
                self.reopen(refresh=refresh)
                tries += 1

'refresh' tells reopen() whether or not to refresh metadata (which,
itself, might raise a modified error and may need to be retried).
This pattern has worked very well for us in a highly replicated
environment.  We don't use a shared filesystem, is this what you are
using to 'replicate' your db to your readers?  Have you considered
using xapian-replication instead?  This will apply changes to your
readers in transactional chunks as Chris suggested and may widen your
modification-free query window.

-Michel

>
> So I can't guarantee a pause in indexing of any particular length because there are multiple independent indexers.  Is there a way around this or should I reconsider this architecture?
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list