[Xapian-discuss] Optimal usage of xapian-compact for merging

Henry C. henka at cityweb.co.za
Mon Mar 29 13:51:21 BST 2010


On Tue, March 23, 2010 19:46, Kevin Duraj wrote:
> I am merging 300 indexes at once, it takes less than a day for merge
> to happen for 100 million documents, during merging I notice very heavy IO.

That IO sounds pretty normal.  To help with IO load, we have a dedicated
index store cluster, dedicated source data cluster, dedicated indexing
cluster, etc.  Sigh.  Each time I think we have enough h/w and
source/index space, reality rudely smirks and I have to scale some more...

btw, what's your index size?

> Tomorrow I am planning to install new Seagate Barracuda XT
> Hard Drive - 2TB, 7200 RPM, SATA 6G, 64MB Cache on find1friend.com
> server that will replace my old 1TB Barracuda because it is running out of
> space. My old system runs on CentOS 5 with with 1KB disk-block size
> running two Xapian  indexes of around 150 million documents, running
> fairly fast as you can see: http://find1friend.com/  Although I might be
> not be able to use SATA 6GB without additional interface, but let see what
> happens, I don't want to put my datacenter on fire, my co-location
> providers are very nice to me. :-)
>
> tune2fs -l /dev/sda1 Block size:               1024
> Fragment size:            1024
>
>
> Performance is excellent, but will try to using Ubuntu server 9.10

Your performance is quick indeed.  Have you tried parallel load tests with
something like hammerhead?  ie, how do things perform when you have
5/16/32/64/n users (with different queries) hitting your engine?  Your RAM
is going to be a big factor here depending on the size of your index.

/sidebar:  I remember a couple of years ago checking out a competitor
(local) search engine (lucene-based I think).  Using a single tab in
firefox the performance was ok.  When I opened up a bunch and performed
simultaneous searches their machine ground to a halt...  gotta plan for
that load.

> with disk-block size 16KB to see whether the search engine gets better,

That will be quite interesting to hear about since it's also something
I've pondered.  Please post your results!


> PS: Search 150 million documents from one hard drive using Xapian.
> Can imagine what Xapian would do, using two hard drives! :-)

You can try short-stroking the drives in combination with RAID0 :)

h







More information about the Xapian-discuss mailing list