[Xapian-discuss] Atomic DB rebuilds

Sam Liddicott sam at liddicott.com
Wed Oct 6 13:11:31 BST 2004


I'm not complaining, but I think the conclusion is that there is 
currently no way to atomically replace a xapian database that is in 
production use.
(OK, it would be for the xapian client to use a different database name)

I think my replacing the DB in-place is just a grosser form of a large 
transaction on the database; except ironically it's an operation that 
only occurs often because the transaction is relatively small - or 
rather one in which I could also expect to manage with btree 
transactions given enough memory (and there is).

This suggests to me that it is the wrong solution, and that I probably 
should be making incremental changes to xapian to mirror the sql database.

In my case the xapian database records are a union of joins on the sql 
database, or a mix of de-normalised data.
This means the the edit of one record in the sql database could result 
in many xapian records being rebuild (or deleted).

My xapian records have a term to relate to the primary keys of the 
varous sql tables from which they were drawn.

I think the update strategy should be to update in xapian all records 
containing records that have changed. This can be done simply by feeding 
to db2omega as part of the join, a condition that selects any rows with 
id's that have changed and the right scriptindex data will be generated.

Deleting from xapian all records that refer to deleted records is a bit 
different, scriptindex requires the unique id of a record in order to 
delete it

I think its time for me to look at the perl bindings again and do 
something with those.

Sam


Olly Betts wrote:

>On Tue, Oct 05, 2004 at 02:07:08PM +0100, Olly Betts wrote:
>  
>
>>Looks like we need to canonicalise the path to the database directory
>>by eliminating symlinks and making it absolute.
>>    
>>
>
>Done a little more digging and this is a really nasty problem.  The
>answer ought to be to use realpath (man 3 realpath) but unfortunately
>the design is broken because you pass a pointer for the result which
>has to be a buffer of size PATH_MAX which on some platforms may be huge
>and unsuitable for mallocing, or worse still it might be -1 (meaning
>unbounded).  So it's impossible to use portably without risking a buffer
>overflow.
>
>You can't roll your own portably, since you can't portably use the value
>returned by readlink as a path you can actually use.  And while you
>can use `open(".") / chdir(path) / getcwd() / fchdir()', that's not
>ideal because any signal handler called will get the wrong current
>directory:
>
>http://sources.redhat.com/ml/libc-alpha/2001-09/msg00228.html
>
>Actually, I think the `open(".") / chdir(path) / getcwd() / fchdir()'
>approach is probably the least bad, most portable solution for us.  We
>can even look at blocking signal delivery for the critical part.
>
>And we can probably just use realpath where the buffer size is sane (I
>found some LGPL code for that).
>
>Cheers,
>    Olly
>
>_______________________________________________
>Xapian-discuss mailing list
>Xapian-discuss at lists.xapian.org
>http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20041006/04c08842/attachment.htm


More information about the Xapian-discuss mailing list