[Xapian-discuss] Xapian across multiple servers

Richard Boulton richard at lemurconsulting.com
Tue Jul 22 18:21:46 BST 2008


Olly Betts wrote:
>> 2) Run Xapian locally on each front-end webserver, but storing the
>> index on shared storage.  This will be I/O intensive, but doesn't
>> involve syncing changes to out to each front-end.
> 
> Probably OK if the search load is low.  A high search load on a large
> database will probably get ugly.

Actually, even for a reasonably high search load, if the Xapian database 
is very rarely changed this could work well if the database is small 
enough to get fully (or largely) cached in memory.

However, whenever the database is changed, it is likely that the entire 
database will be dropped from cache, because remote filesystems tend not 
to be very efficient at sharing modifications to large files.  In 
particular, I'm fairly sure that all versions of NFS (well, 2, 3 and 4 
at least) have no support for informing the client that a file has been 
partially modified, so clients have little option other than to drop the 
entire file from cache whenever the mtime of the file is changed. As a 
result, search performance will be poor for a while after any change to 
the database, until the cache is repopulated.

So, this approach is probably only workable for a database which is 
small enough to be fully cached, and is also only modified occasionally 
(eg, in a nightly update).

>> 3) Run Xapian centrally, and access it using the remote protocol from
>> each front end.  This obviously makes it easier to construct the
>> index, but it's not obvious how to make this scalable / HA.
> 
> Should involve less network traffic than mounting the database over NFS
> or similar, but it does put all the search load on one server.
> 
>> 4) Something else?
> 
> SVN trunk has a database replication feature which is aimed at this sort
> of situation.  It's not been released yet, and could still change before
> it is, but it's being used on at least one live site I believe.

Which reminds me - I must update the documentation of the replication 
stuff in SVN.  There are currently two documents about it: 
xapian-core/docs/replication.rst is an overview of how and why to use 
it, and xapian-core/docs/replication_protocol.rst covers (some of) the 
internals of the replication system.  I think a good chunk of the 
replication.rst documentation should probably move to admin_notes.rst 
(in particular, the "Alternative approaches" section probably belongs 
there, since it's not really about the replication stuff).

-- 
Richard



More information about the Xapian-discuss mailing list