[Xapian-devel] Performance tests
Richard Boulton
richard at lemurconsulting.com
Tue Jan 2 14:08:04 GMT 2007
Just a note to say that I'm working on a performance test framework for
Xapian, mainly targeted at analysing query speeds. I'm hoping to make
this into a reasonably easy system to set-up, so that we can get
performance test results on lots of architectures.
For sample data, I'm using an XML dump of wikipedia - I've got a simple
python script which converts this into a scriptindex input file (though
doesn't yet do anything about understanding the wiki mark-up). This
results in a 25Gb database (containing 2657375 documents, average length
492 terms), which should give a good basis for benchmarking searches in
an IO bound situation. I should probably also use some smaller corpuses
to cover the CPU/memory-IO bound situation.
I've made a bug in the bug tracker to track progress on this (#107 -
http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=107) - I'm hoping
to eventually get this running on at least one machine on a regular
basis, so that we can track how revisions to the code affect
performance. In particular, I want this in place before I work on bug #100.
At present, I'm just posting this here to let people know that I have
some code which parses wikipedia XML dumps, so they don't waste time
writing their own one - ask me instead. I'll publish the code publicly
when it's all tidied up.
--
Richard
More information about the Xapian-devel
mailing list