[Xapian-discuss] Filesystems

Mon Jul 13 16:45:40 BST 2009

My colleague is done testing. Please mind that the following results are 
not exact sience and tested with a specific database and with a 
relatively unique system.

Our xapian-database is about 25GB in size and 19GB compacted. This is 
for a 8.7GB text-file which was fed into scriptindex to index and read 
with omega (with a very simple template) to search through.

The benchmark-system:

The benchmarks are done using a Dell PowerEdge R610 with 2x X5570 cpu's, 
24GB 1066Mhz ddr3 memory, 2x 300GB 10k rpm sas-disks in raid1 for data, 
4x 50GB ssd in raid5 for the xapian-database. This raid5 had a 
stripe-size of 256KB, which is a bit arbitrarily selected (larger is 
normally better, but this should also be the size in which ssd's 
normally cluster their writes).
The software was a normal, recent debian 'testing' with a hand-compiled 
2.6.30-kernel. We used Xapian 1.0.13 to create both the initial database 
and the rest of the benchmark. We didn't change any kernel-parameters.

All the write-benchmarks were single-processed with the ssd's as target 
device. All the search-benchmarks were multi-processed with an omega 
started for each query on the commandline. There were 25 omega's maximum 
at the same time.

File-systems:

We tested the following file-systems: ext2, ext3, ext4, xfs, reiser3, 
btrfs and nilfs. Nilfs wasn't too stable (it went away during 
benchmarking), although it did some tests very fast. All were both 
created and mounted with the default flags, apart from enabling 'noatime'.

Benchmarks:
We did three write and one read-test. The writes; copying a fresh 
start-database from the data-drives to the ssd-drives, updating that 
database with a fixed scriptindex-file and finally compacting that 
database reading from, and writing to, the ssd-drives.

The read-test was obviously executing several thousand queries from our 
logs. To prevent the kernel from heavily caching the filesystem in 
memory, we loaded an inactive process locking 20GB of the memory and we 
disabled swap. This wasn't ideal, but we couldn't boot with mem=4G or 
something similar (it would crash at boot than).

Times are in seconds, so lower is better.

Database copying from the raid1 to the ssd-raid5:
btrfs	216
xfs	224
reiser3	230
ext2	232
ext4	232
ext3	239
nilfs	346

Database updating:
nilfs	218
ext4	231
btrfs	232
ext2	261
ext3	265
reiser3	265
xfs	267

Database compation:
xfs	470
ext4	478
btrfs	478
reiser3	514
ext2	560
ext3	577
nilfs	761

Executing about 2500 of our slowest queries with a I/O-dependent forced 
database:
ext4	766
xfs	768
ext3	785
ext2	795
reiser3	801
nilfs	837
btrfs	849

 From these results, we opted for ext4 for our system.

We also ran several benchmarks with various compacted databases. During 
these search-benchmarks we didn't cripple the memory, and we didn't 
cherry-pick the worst queries. Again the numbers are in seconds, so 
lower is better. The below-scores are read-only benchmarks, mostly 
fulfilled from memory rather than the ext4-fs:

database	run1	run2
non-compacted	104.8	105.4
fuller compact	99.5	100.5
block size 2kb	135.8	136.5
block size 4kb	136.5	133.9
block size 8kb	100.9	101.6
block size 16kb	 82.1	 81.8
block size 32kb	 79.4	 79.0
block size 64kb	 88.8	 88.4

Fuller compaction doesn't seem to offer too much value over a normal 
(8kb) compacted database, but this may be different on a i/o-limited 
benchmark. Ours was cpu-limited.
Apart from that, going to 32kb block size seems to be a wise choice for us.

I hope you find these numbers useful. We can't really do any other 
benchmark on the given system any more, but we're thinking of doing some 
of the same benchmarks on the old system its replacing soon (and which 
has less memory and slower disks).

Best regards,

Arjen

On 1-7-2009 13:58 Arjen van der Meijden wrote:
> Apart from noatime we haven't used any specific additional mount- or 
> mkfs options yet. But we're going to check a few to see how they'll do.
> 
> Best regards,
> 
> Arjen
> 
> On 1-7-2009 12:00, Frank John Bruzzaniti wrote:
>> Arjen,
>>
>> Are you using any mount options optimisations like noatime or noboundary 
>> with xfs?
>> Are you using any mount option optimisations with ext4?
>>
>>
>> Arjen van der Meijden wrote:
>>> My colleague is testing several filesystems on our new search-machine. 
>>> He has been looking at a few of the filesystems available in the 
>>> 2.6.30 linux kernel, ext2/3/4, xfs, btrfs, nilfs2 and reiser4.
>>>
>>> "Unfortunately" the new machine has 24GB ram and 4x ssd in raid5. To 
>>> get somewhat IO-bound results we had to cripple the machine (by making 
>>> sure it couldn't use 20gb of those 24gb for file-cache) *and* 
>>> cherry-pick our queries (only the heaviest with phrase-queries and such).
>>>
>>> In the normal scenario of having the full ram (or even 4gb) available 
>>> and the ssd's backing up any cache-miss, it is simply cpu-bound. And 
>>> that is with the fastest x86 2-socket cpu's available right now, a 
>>> pair of intel X5570's. The good news is that it actually appears to 
>>> scale very well when using more cpu-cores (this one has 8 cores with 8 
>>> hyper-threading cores) and that we can get about 90 searches per 
>>> second out of it, which is more than we do now per minute (and we 
>>> haven't benchmarked the compacted database yet).
>>>
>>> I.e. our results indicate that for our reads it hardly matters which 
>>> filesystem to pick, most of the database will be in RAM any way.
>>>
>>> With the crippled, extra-io, read-scenario, we do see differences in 
>>> performance between de filesystems tested.
>>>
>>> When finished, we'll have numbers for linear writes (copying the 25GB 
>>> database from another disk array), non-linear writes (updating the 
>>> database) and semi-linear writes (compacting the database) with 
>>> semi-linear (memory-backed) reads.
>>> And of course the numbers for the crippled read-scenario.
>>>
>>> So far ext4 and xfs seem to be the best choices, both in read and 
>>> write scenario's. But obviously, our numbers are done with ssd, not 
>>> normal disks.
>>>
>>> We haven't yet tested the various mount/mkfs-options (apart from 
>>> enabling noatime), but we'll probably settle for ext4 and then try a 
>>> few options to better suit the filesystem to the underlying 
>>> blockdevice. After that we'll also try what the various 
>>> compaction-options do to the read-performance.
>>>
>>> Best regards,
>>>
>>> Arjen
>>>
>>> On 1-7-2009 3:18 James Aylett wrote:
>>>  
>>>> On Tue, Jun 30, 2009 at 06:12:55PM -0700, Kevin Duraj wrote:
>>>>
>>>>    
>>>>> Based on my observation Flint runs best on ext3 filesystems
>>>>>       
>>>> I don't suppose you're able to share any of your numbers from this? I
>>>> know it's not always possible, but having something on the wiki would
>>>> be useful to people, if only to point them in useful directions of how
>>>> to construct their own testing. (I assume you were comparing against
>>>> JFS and XFS, maybe Reiser?)
>>>>
>>>> J
>>>>
>>>>     
>>> _______________________________________________
>>> Xapian-discuss mailing list
>>> Xapian-discuss at lists.xapian.org
>>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>>   
> 
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>