[Xapian-discuss] Xapian across multiple servers
richard at lemurconsulting.com
Tue Jul 22 18:21:46 BST 2008
Olly Betts wrote:
>> 2) Run Xapian locally on each front-end webserver, but storing the
>> index on shared storage. This will be I/O intensive, but doesn't
>> involve syncing changes to out to each front-end.
> Probably OK if the search load is low. A high search load on a large
> database will probably get ugly.
Actually, even for a reasonably high search load, if the Xapian database
is very rarely changed this could work well if the database is small
enough to get fully (or largely) cached in memory.
However, whenever the database is changed, it is likely that the entire
database will be dropped from cache, because remote filesystems tend not
to be very efficient at sharing modifications to large files. In
particular, I'm fairly sure that all versions of NFS (well, 2, 3 and 4
at least) have no support for informing the client that a file has been
partially modified, so clients have little option other than to drop the
entire file from cache whenever the mtime of the file is changed. As a
result, search performance will be poor for a while after any change to
the database, until the cache is repopulated.
So, this approach is probably only workable for a database which is
small enough to be fully cached, and is also only modified occasionally
(eg, in a nightly update).
>> 3) Run Xapian centrally, and access it using the remote protocol from
>> each front end. This obviously makes it easier to construct the
>> index, but it's not obvious how to make this scalable / HA.
> Should involve less network traffic than mounting the database over NFS
> or similar, but it does put all the search load on one server.
>> 4) Something else?
> SVN trunk has a database replication feature which is aimed at this sort
> of situation. It's not been released yet, and could still change before
> it is, but it's being used on at least one live site I believe.
Which reminds me - I must update the documentation of the replication
stuff in SVN. There are currently two documents about it:
xapian-core/docs/replication.rst is an overview of how and why to use
it, and xapian-core/docs/replication_protocol.rst covers (some of) the
internals of the replication system. I think a good chunk of the
replication.rst documentation should probably move to admin_notes.rst
(in particular, the "Alternative approaches" section probably belongs
there, since it's not really about the replication stuff).
More information about the Xapian-discuss