[Xapian-discuss] Xapian performance

Arjen van der Meijden acmmailing at tweakers.net
Tue May 24 18:55:56 BST 2005


Hi Georges,

On 24-5-2005 18:04, Georges Dupret wrote:
> Hi!
> 
> While promoting xapian, I have been asked to answer the following
> question:	 How many queries per second is Xapian able to answer?
> The database size would be around 15 GB of uncompressed text, on a
> regular desktop machine with one CPU and 1G of RAM.

I don't think there is a single answer to that question. We have a 
similar sized database (about 18GB) on a much heavier machine (dual Xeon 
2.8, 4GB of memory, two scsi discs in raid 0 dedicated for the xapian 
database).
It can handle quite a bit of queries, but its not its only task, 
although it is its most expensive task in terms of performance.

The load of the machine varies a lot, but averages out to about 3. In 
the past six hours it was loaded with a peak hour of 2770 real-user 
queries. But I doubt that is the maximum of queries it can achieve. 
Especially if there weren't positional searches (string matches, near 
searches, etc).

These are the number of queries the machine has had to process since May 
22nd about midnight.

    86098 normalsearches.log
     3406 slowsearches.log
       31 error.log
    89535 total

A query is logged in the slowsearches.log if it lasted more than 2 
seconds pure search time (a query can take more when the 
resultprocessing is accounted as well).

I have never really benchmarked the capacity of Omega, it just basically 
is "fast enough" apart from some corner-cases with the positional searches.
We have had the database (a while back, so it was smaller) on a lower 
end server (dual xeon 2.4Ghz, 2 GB ram, one ide disk) and that was 
significantly higher loaded at the time, but it managed to process the 
load of our site. It could get very sufficated in I/O though when there 
were some positional searches at the same time.

Since normal searches take an average of some 0.2 - 0.4 seconds (to be 
on the safe side) when the machine is not loaded, I estimate it can 
easily reach 5-10 queries/second on our database.
Of course it all depends on your documents, the amount of distinct 
terms, the queries you have, etc.
If all documents are very well constructed (no weird terms, so the 
amount of distinct terms is much lower) and well filtered for "stop 
words", you might be able to achieve twice or more our performance.

I hope I was of any help. Please note that I don't think the 
"#queries/second" is a very usefull statement in this context. The 
amount of data it can search "fast enough to satisfy the users under 
your normal load" is much more interesting (and harder to define).

Best regards,

Arjen



More information about the Xapian-discuss mailing list