[Xapian-discuss] Xapian on SSD vs SATA

Henry henka at cityweb.co.za
Fri Oct 23 14:23:09 BST 2009


Quoting "Arjen van der Meijden" <acmmailing at tweakers.net>:
> On 23-10-2009 14:11, Henry wrote:
>> As you can see, keyword queries are quicker, but lack the wow! factor.
>> Phrase queries *really* benefit - 10x faster is what one would  
>> expect from using an SSD.
>
> Why would you have that expectation? If most of the data you're  
> looking at fits in RAM, then SSD vs normal disk will see only small  
> differences (not that 2 vs 1.2 is insignificant).

I was referring to very large indexes - not small ones which might fit  
in RAM.  I'm not sure what you know about SSDs, but the performance  
gains can be very significant depending on the application - for  
RDBMS, eg, the gain is typically 10x from our experience (so, instead  
of waiting for a 10s transaction to complete, the same can be  
completed in under a second - that's night/day when dealing with  
customer expectations).

You're thinking in terms of small indexes.  I'm referring to splitting  
a very large index across many cluster nodes for performance.  I'm not  
sure at this stage, since indexing is ongoing, but the index is  
already ~900GB.

2s vs 1.2s is not insignificant (maybe for your application it is, and  
that's fine).  You're making the assumption that *your* user  
expectation is the same as ours.  More importantly, consider search  
volumes.  Yours may be 1 every hour, ours might be 100-200 a minute.

>> So,... to conclude:  SSDs provide excellent gains for phrase  
>> queries - something 99.9999% of users won't use anyway :(
>
> Well, the fastest queries didn't probably need much optimization  
> anyway... So perhaps you shouldn't bother too much about those  
> faster queries, they're fast enough to begin with.

No, they're not.

> Apart from that; You should probably attempt to test a mix of  
> queries to see how much influence a running phrase query has on the  
> other queries at that time.

This will be mitigated by cluster load balancing and scaling.

> But your results are similar to ours, which I posted a while back to  
> the list. The fast queries are cpu-bound (on the top-nehalem), even  
> forcing the system to have only a few GB of memory didn't make it  
> IO-bound in a our tests.

Thanks for your post.  Anyway, it's imperative in our application to  
meet customer expectations (which is why I emphasised the 2/1.2 second  
difference).  We cannot expect our customers to wait 2-5s, never mind  
30s, for a query to complete (especially when they're rapid-fire and  
have been spoilt by google).

For you and I waiting for a query to complete gives us time to get  
more coffee, for a customer it means picking up the phone and  
complaining (or taking their money elsewhere) ;)

Cheers
Henry



More information about the Xapian-discuss mailing list