[Xapian-discuss] Feature request: Ligthen pressure on backup

Olly Betts olly at survex.com
Mon Mar 31 05:09:05 BST 2008

On Mon, Mar 24, 2008 at 07:07:38AM +0100, Jesper Krogh wrote:
> The suggesting would be to split the files in several smaller files. I 
> know that the algorithms for searching the binary trees probably would 
> be a bit more complex, but it could result in that changes only touches 
> a subset of the files, thus letting the backup proceed easier.

This idea seems problematic.  We'd either need to keep a lot more files
open (and file handles are a limited resource, though the limit is
reasonable for most modern OSes), or manage opening and closing them,
which will incur system call overheads, and may cause undesirable cache
flushing behaviour.

And for a system which updates old records, it doesn't even relieve the
backup system much - you only need to update a single document (or term
for the postlist table) in a chunk of the table to mean that whole chunk
needs to be backed up.  It's much better for a single document update,
but does progressively less well for 2, 3, 4, ... unless you only add
new documents.

I think a better way to ease the backup pain would be to build upon the
database replication functionality which should be in 1.1.0 (unless
there's a major issue found which we can't address in time).

This would allow a truly incremental backup - you'd save away a file
which describes the changes since the last backup and which can be
replayed to update the previous version of the database fairly
efficiently.  The incremental file should be proportional to the
size of the changes.


More information about the Xapian-discuss mailing list