[Xapian-discuss] using Xapian as backend for google

Chris Good chris at g2.nu
Wed Dec 13 15:54:44 GMT 2006

Olly Betts wrote:
> Webtop used xapian-tcpsrv to spread searches over a number of boxes
> (10 or so IIRC).  The index size was around 500 million documents, but
> with modern hardware that's much less of a challenge than it was more
> than 6 years ago.

Gosh is it that long ago.  Back then we used 20 dual processor boxes
per cluster, each cluster having a complete dataset, processor speeds
ranged from 500-850MHz and I have a recollection of 2GB
of ram being fitted to the machines.

For webtop I don't think that we bothered with any redundancy in the
disks, I certainly wouldn't do so these days as I'd just keep a couple
of spare machines around and upon failure assign one of those to
take over the DB file of the failed machine.  This necessitates having
a centralised repository of all your data that machines can sync from,
a nice fast network (gigabit is essentially free these days and is
more than adequate) and the means to do an automated boot/installation
of machines.  

For webtop we used a NetApp server for the central store and had a 
kickstart configuration and bootp/tftp server (you'd use PXE and kickstart
these days).  When a machine failed you'd replace the disk or whatever,
network boot, which would install the local OS and RPMs containing all
our localised configuration.

In this day and age I'd be looking at lowish cost servers with a couple
of SATA drives (you don't really need the capacity but the cost of SAS
is too high to justify in my mind) and as much memory as I could shove
in the box.

More information about the Xapian-discuss mailing list