[Xapian-discuss] Xapian on SSD vs SATA
Arjen van der Meijden
acmmailing at tweakers.net
Fri Oct 23 15:00:04 BST 2009
On 23-10-2009 15:23, Henry wrote:
>
> I was referring to very large indexes - not small ones which might fit
> in RAM. I'm not sure what you know about SSDs, but the performance
> gains can be very significant depending on the application - for RDBMS,
> eg, the gain is typically 10x from our experience (so, instead of
> waiting for a 10s transaction to complete, the same can be completed in
> under a second - that's night/day when dealing with customer expectations).
I know plenty of SSD and such :) My point is that the hottest parts of
the database will fit in memory, even for databases that are more than
twice the size of your memory.
Our "small" index is about 25GB by the way, but does indeed fit in the
24GB memory of our server.
> You're thinking in terms of small indexes. I'm referring to splitting a
> very large index across many cluster nodes for performance. I'm not
> sure at this stage, since indexing is ongoing, but the index is already
> ~900GB.
Well, given the fact that you're testing with "only" 4 and 14GB... If
you're going to only put a 14GB database on a node, you should not even
bother about ssd, but just put a sata disk and 16GB ram in the machine
with plenty of cpu-power. With the current pricing of nehalem hardware,
that is actually a relatively affordable strategy for up to 72GB of ram
per node, after that the price per GB doesn't scale linearly anymore.
Obviously, disks and even enterprise ssd's are cheaper per gb.
My guess is the difference between your disks will further increase when
you're moving farther away from the RAM-size, i.e. a 50 or 100GB
database will see a larger difference. After some point, even the hotter
parts of the btree may not even fit in memory anymore, where I'd expect
the ssd vs disk to really pay off.
What kind of amount of cluster-nodes are you thinking about? Using 4
nodes, each with a quarter of the database and a budget of $15k per
server will obviously have different characteristics compared to a
20-node cluster with only $2k per server.
> 2s vs 1.2s is not insignificant (maybe for your application it is, and
> that's fine). You're making the assumption that *your* user expectation
> is the same as ours. More importantly, consider search volumes. Yours
> may be 1 every hour, ours might be 100-200 a minute.
We operate a large website, with performance-saffy experienced
computer-users... So we, as developers, try to keep each server
generated page under 0.1 second, that includes searches in that 25GB
database :)
Our searchvolume is similar to yours, although our database is obviously
much smaller and we only use one server.
Our normal searches where already mostly below that 0.1 second
threshold, the ssd's, oversized ram and top-of-the-line cpu's made sure
most phrase queries now are pretty fast as well.
> No, they're not.
I'd think you should also try and see whether a faster cpu (or one with
more memory bandwidth) increases those single term queries. As said, in
our benchmarks, with our ram and ssd's, we ended up being cpu-bound.
By the way, perhaps ext4 will give you some gains as well.
> Thanks for your post. Anyway, it's imperative in our application to
> meet customer expectations (which is why I emphasised the 2/1.2 second
> difference). We cannot expect our customers to wait 2-5s, never mind
> 30s, for a query to complete (especially when they're rapid-fire and
> have been spoilt by google).
I totally agree, there have been some reports about users loosing there
interest if the page takes more than 0.2-1 second...
Best regards,
Arjen
More information about the Xapian-discuss
mailing list