[Xapian-discuss] Initial benchmark results quartz and flint
Arjen van der Meijden
acmmailing at tweakers.net
Wed Jun 29 12:19:54 BST 2005
I've done some benchmarking and have the first set of results here. The
databases (their size and parameters) can be found earlier this month on
the list if you're interested.
It appears from these results that flint is significantly faster to
search in, both with phrase-queries and normal queries. Another,
somewhat surprising result is that the non-compacted quartz-databases
are *much* faster with phrase-queries.
flint normal non-phrase: 155,574 s
flint normal phrase: 2 841,680 s
flint compact non-phrase: 96,569 s
flint compact phrase: 3 026,939 s
flint compact -F non-phrase: 94,227 s
flint compact -F phrase: 2 623,404 s
quartz normal non-phrase: 169,853 s
quartz normal phrase: 7 037,056 s
quartz compact -F gz non-phrase: 108,783 s
quartz compact -F gz phrase: 9 249,504 s
quartz compact -n-F gz non-phrase: 109,650 s
quartz compact-n-F gz phrase: 8 090,707 s
quartz compact non-phrase: 103,863 s
quartz compact phrase: 9 410,721 s
quartz compact 0.8.4 gz non-phrase: 108,299 s
quartz compact 0.8.4 gz phrase: 8 100,171 s
The benchmark was done by creating a seperate directory on a pretty fast
hard drive (WD Raptor 36GB 10k rpm sata) that is solely handling the
current database. The machine has only 1GB of memory, so was pretty much
I/O-bound with the phrase queries.
The script would first remove the previous database and then copy the
current database to that same disk. This is not included in the timings.
Then I took the current time in seconds, took all queries from a file
that would parse to not have a PHRASE-part and execute those and after
that the queries that did do PHRASE-searches.
This yielded in 65 phrase-queries and 1035 other queries. If it were
"morelike", boolean-only queries etc, they would be executed as empty
queries since I was too lazy to implement that correctly.
I cannot explain from the hardware or benchmark setup why the compacted
quartz databases are so much slower with phrase. First I thought it may
have been the way they were laid out on disk during their creation; copy
database may have a tendency to stick the specific database records for
a document closer to each other, while quartzcompact copies the database
table by table. But since I copied them using the standard unix copy
command, that should not be the case with the benchmarks I did now.
I haven't verified whether all results were the same over the databases,
I'll have to do that to see whether the flint-results were actually
correct, but I don't have reasons to believe otherwise yet.
To be sure it are not one-time-only numbers, I'm running the benchmarks
twice more but since that'll take almost a day per run I sent these
numbers to the list already.
More information about the Xapian-discuss