[Xapian-discuss] Initial benchmark results quartz and flint

Wed Jun 29 12:19:54 BST 2005

Hi List,

I've done some benchmarking and have the first set of results here. The 
databases (their size and parameters) can be found earlier this month on 
the list if you're interested.

It appears from these results that flint is significantly faster to 
search in, both with phrase-queries and normal queries. Another, 
somewhat surprising result is that the non-compacted quartz-databases 
are *much* faster with phrase-queries.

flint normal non-phrase: 155,574 s
flint normal phrase: 2 841,680 s
flint compact non-phrase: 96,569 s
flint compact phrase: 3 026,939 s
flint compact -F non-phrase: 94,227 s
flint compact -F phrase: 2 623,404 s

quartz normal non-phrase: 169,853 s
quartz normal phrase: 7 037,056 s
quartz compact -F gz non-phrase: 108,783 s
quartz compact -F gz phrase: 9 249,504 s
quartz compact -n-F gz non-phrase: 109,650 s
quartz compact-n-F gz phrase: 8 090,707 s
quartz compact non-phrase: 103,863 s
quartz compact phrase: 9 410,721 s
quartz compact 0.8.4 gz non-phrase: 108,299 s
quartz compact 0.8.4 gz phrase: 8 100,171 s

The benchmark was done by creating a seperate directory on a pretty fast 
hard drive (WD Raptor 36GB 10k rpm sata) that is solely handling the 
current database. The machine has only 1GB of memory, so was pretty much 
I/O-bound with the phrase queries.
The script would first remove the previous database and then copy the 
current database to that same disk. This is not included in the timings.

Then I took the current time in seconds, took all queries from a file 
that would parse to not have a PHRASE-part and execute those and after 
that the queries that did do PHRASE-searches.
This yielded in 65 phrase-queries and 1035 other queries. If it were 
"morelike", boolean-only queries etc, they would be executed as empty 
queries since I was too lazy to implement that correctly.

I cannot explain from the hardware or benchmark setup why the compacted 
quartz databases are so much slower with phrase. First I thought it may 
have been the way they were laid out on disk during their creation; copy 
database may have a tendency to stick the specific database records for 
a document closer to each other, while quartzcompact copies the database 
table by table. But since I copied them using the standard unix copy 
command, that should not be the case with the benchmarks I did now.

I haven't verified whether all results were the same over the databases, 
I'll have to do that to see whether the flint-results were actually 
correct, but I don't have reasons to believe otherwise yet.

To be sure it are not one-time-only numbers, I'm running the benchmarks 
twice more but since that'll take almost a day per run I sent these 
numbers to the list already.

Best regards,

Arjen