[Xapian-discuss] Filesystems
Arjen van der Meijden
acmmailing at tweakers.net
Mon Jul 13 16:45:40 BST 2009
My colleague is done testing. Please mind that the following results are
not exact sience and tested with a specific database and with a
relatively unique system.
Our xapian-database is about 25GB in size and 19GB compacted. This is
for a 8.7GB text-file which was fed into scriptindex to index and read
with omega (with a very simple template) to search through.
The benchmark-system:
The benchmarks are done using a Dell PowerEdge R610 with 2x X5570 cpu's,
24GB 1066Mhz ddr3 memory, 2x 300GB 10k rpm sas-disks in raid1 for data,
4x 50GB ssd in raid5 for the xapian-database. This raid5 had a
stripe-size of 256KB, which is a bit arbitrarily selected (larger is
normally better, but this should also be the size in which ssd's
normally cluster their writes).
The software was a normal, recent debian 'testing' with a hand-compiled
2.6.30-kernel. We used Xapian 1.0.13 to create both the initial database
and the rest of the benchmark. We didn't change any kernel-parameters.
All the write-benchmarks were single-processed with the ssd's as target
device. All the search-benchmarks were multi-processed with an omega
started for each query on the commandline. There were 25 omega's maximum
at the same time.
File-systems:
We tested the following file-systems: ext2, ext3, ext4, xfs, reiser3,
btrfs and nilfs. Nilfs wasn't too stable (it went away during
benchmarking), although it did some tests very fast. All were both
created and mounted with the default flags, apart from enabling 'noatime'.
Benchmarks:
We did three write and one read-test. The writes; copying a fresh
start-database from the data-drives to the ssd-drives, updating that
database with a fixed scriptindex-file and finally compacting that
database reading from, and writing to, the ssd-drives.
The read-test was obviously executing several thousand queries from our
logs. To prevent the kernel from heavily caching the filesystem in
memory, we loaded an inactive process locking 20GB of the memory and we
disabled swap. This wasn't ideal, but we couldn't boot with mem=4G or
something similar (it would crash at boot than).
Times are in seconds, so lower is better.
Database copying from the raid1 to the ssd-raid5:
btrfs 216
xfs 224
reiser3 230
ext2 232
ext4 232
ext3 239
nilfs 346
Database updating:
nilfs 218
ext4 231
btrfs 232
ext2 261
ext3 265
reiser3 265
xfs 267
Database compation:
xfs 470
ext4 478
btrfs 478
reiser3 514
ext2 560
ext3 577
nilfs 761
Executing about 2500 of our slowest queries with a I/O-dependent forced
database:
ext4 766
xfs 768
ext3 785
ext2 795
reiser3 801
nilfs 837
btrfs 849
From these results, we opted for ext4 for our system.
We also ran several benchmarks with various compacted databases. During
these search-benchmarks we didn't cripple the memory, and we didn't
cherry-pick the worst queries. Again the numbers are in seconds, so
lower is better. The below-scores are read-only benchmarks, mostly
fulfilled from memory rather than the ext4-fs:
database run1 run2
non-compacted 104.8 105.4
fuller compact 99.5 100.5
block size 2kb 135.8 136.5
block size 4kb 136.5 133.9
block size 8kb 100.9 101.6
block size 16kb 82.1 81.8
block size 32kb 79.4 79.0
block size 64kb 88.8 88.4
Fuller compaction doesn't seem to offer too much value over a normal
(8kb) compacted database, but this may be different on a i/o-limited
benchmark. Ours was cpu-limited.
Apart from that, going to 32kb block size seems to be a wise choice for us.
I hope you find these numbers useful. We can't really do any other
benchmark on the given system any more, but we're thinking of doing some
of the same benchmarks on the old system its replacing soon (and which
has less memory and slower disks).
Best regards,
Arjen
On 1-7-2009 13:58 Arjen van der Meijden wrote:
> Apart from noatime we haven't used any specific additional mount- or
> mkfs options yet. But we're going to check a few to see how they'll do.
>
> Best regards,
>
> Arjen
>
> On 1-7-2009 12:00, Frank John Bruzzaniti wrote:
>> Arjen,
>>
>> Are you using any mount options optimisations like noatime or noboundary
>> with xfs?
>> Are you using any mount option optimisations with ext4?
>>
>>
>> Arjen van der Meijden wrote:
>>> My colleague is testing several filesystems on our new search-machine.
>>> He has been looking at a few of the filesystems available in the
>>> 2.6.30 linux kernel, ext2/3/4, xfs, btrfs, nilfs2 and reiser4.
>>>
>>> "Unfortunately" the new machine has 24GB ram and 4x ssd in raid5. To
>>> get somewhat IO-bound results we had to cripple the machine (by making
>>> sure it couldn't use 20gb of those 24gb for file-cache) *and*
>>> cherry-pick our queries (only the heaviest with phrase-queries and such).
>>>
>>> In the normal scenario of having the full ram (or even 4gb) available
>>> and the ssd's backing up any cache-miss, it is simply cpu-bound. And
>>> that is with the fastest x86 2-socket cpu's available right now, a
>>> pair of intel X5570's. The good news is that it actually appears to
>>> scale very well when using more cpu-cores (this one has 8 cores with 8
>>> hyper-threading cores) and that we can get about 90 searches per
>>> second out of it, which is more than we do now per minute (and we
>>> haven't benchmarked the compacted database yet).
>>>
>>> I.e. our results indicate that for our reads it hardly matters which
>>> filesystem to pick, most of the database will be in RAM any way.
>>>
>>> With the crippled, extra-io, read-scenario, we do see differences in
>>> performance between de filesystems tested.
>>>
>>> When finished, we'll have numbers for linear writes (copying the 25GB
>>> database from another disk array), non-linear writes (updating the
>>> database) and semi-linear writes (compacting the database) with
>>> semi-linear (memory-backed) reads.
>>> And of course the numbers for the crippled read-scenario.
>>>
>>> So far ext4 and xfs seem to be the best choices, both in read and
>>> write scenario's. But obviously, our numbers are done with ssd, not
>>> normal disks.
>>>
>>> We haven't yet tested the various mount/mkfs-options (apart from
>>> enabling noatime), but we'll probably settle for ext4 and then try a
>>> few options to better suit the filesystem to the underlying
>>> blockdevice. After that we'll also try what the various
>>> compaction-options do to the read-performance.
>>>
>>> Best regards,
>>>
>>> Arjen
>>>
>>> On 1-7-2009 3:18 James Aylett wrote:
>>>
>>>> On Tue, Jun 30, 2009 at 06:12:55PM -0700, Kevin Duraj wrote:
>>>>
>>>>
>>>>> Based on my observation Flint runs best on ext3 filesystems
>>>>>
>>>> I don't suppose you're able to share any of your numbers from this? I
>>>> know it's not always possible, but having something on the wiki would
>>>> be useful to people, if only to point them in useful directions of how
>>>> to construct their own testing. (I assume you were comparing against
>>>> JFS and XFS, maybe Reiser?)
>>>>
>>>> J
>>>>
>>>>
>>> _______________________________________________
>>> Xapian-discuss mailing list
>>> Xapian-discuss at lists.xapian.org
>>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>>
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
More information about the Xapian-discuss
mailing list