[Xapian-devel] Performance tests

Richard Boulton richard at lemurconsulting.com
Tue Jan 2 14:08:04 GMT 2007

Just a note to say that I'm working on a performance test framework for 
Xapian, mainly targeted at analysing query speeds.  I'm hoping to make 
this into a reasonably easy system to set-up, so that we can get 
performance test results on lots of architectures.

For sample data, I'm using an XML dump of wikipedia - I've got a simple 
python script which converts this into a scriptindex input file (though 
doesn't yet do anything about understanding the wiki mark-up).  This 
results in a 25Gb database (containing 2657375 documents, average length 
492 terms), which should give a good basis for benchmarking searches in 
an IO bound situation.  I should probably also use some smaller corpuses 
to cover the CPU/memory-IO bound situation.

I've made a bug in the bug tracker to track progress on this (#107 - 
http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=107) - I'm hoping 
to eventually get this running on at least one machine on a regular 
basis, so that we can track how revisions to the code affect 
performance.  In particular, I want this in place before I work on bug #100.

At present, I'm just posting this here to let people know that I have 
some code which parses wikipedia XML dumps, so they don't waste time 
writing their own one - ask me instead.  I'll publish the code publicly 
when it's all tidied up.


