[Xapian-discuss] Xapian on SSD vs SATA
Henry
henka at cityweb.co.za
Fri Oct 23 14:23:09 BST 2009
Quoting "Arjen van der Meijden" <acmmailing at tweakers.net>:
> On 23-10-2009 14:11, Henry wrote:
>> As you can see, keyword queries are quicker, but lack the wow! factor.
>> Phrase queries *really* benefit - 10x faster is what one would
>> expect from using an SSD.
>
> Why would you have that expectation? If most of the data you're
> looking at fits in RAM, then SSD vs normal disk will see only small
> differences (not that 2 vs 1.2 is insignificant).
I was referring to very large indexes - not small ones which might fit
in RAM. I'm not sure what you know about SSDs, but the performance
gains can be very significant depending on the application - for
RDBMS, eg, the gain is typically 10x from our experience (so, instead
of waiting for a 10s transaction to complete, the same can be
completed in under a second - that's night/day when dealing with
customer expectations).
You're thinking in terms of small indexes. I'm referring to splitting
a very large index across many cluster nodes for performance. I'm not
sure at this stage, since indexing is ongoing, but the index is
already ~900GB.
2s vs 1.2s is not insignificant (maybe for your application it is, and
that's fine). You're making the assumption that *your* user
expectation is the same as ours. More importantly, consider search
volumes. Yours may be 1 every hour, ours might be 100-200 a minute.
>> So,... to conclude: SSDs provide excellent gains for phrase
>> queries - something 99.9999% of users won't use anyway :(
>
> Well, the fastest queries didn't probably need much optimization
> anyway... So perhaps you shouldn't bother too much about those
> faster queries, they're fast enough to begin with.
No, they're not.
> Apart from that; You should probably attempt to test a mix of
> queries to see how much influence a running phrase query has on the
> other queries at that time.
This will be mitigated by cluster load balancing and scaling.
> But your results are similar to ours, which I posted a while back to
> the list. The fast queries are cpu-bound (on the top-nehalem), even
> forcing the system to have only a few GB of memory didn't make it
> IO-bound in a our tests.
Thanks for your post. Anyway, it's imperative in our application to
meet customer expectations (which is why I emphasised the 2/1.2 second
difference). We cannot expect our customers to wait 2-5s, never mind
30s, for a query to complete (especially when they're rapid-fire and
have been spoilt by google).
For you and I waiting for a query to complete gives us time to get
more coffee, for a customer it means picking up the phone and
complaining (or taking their money elsewhere) ;)
Cheers
Henry
More information about the Xapian-discuss
mailing list