[Xapian-devel] chert vs flint vs lucene

Olly Betts olly at survex.com
Thu Jan 22 00:29:14 GMT 2009


On Wed, Jan 21, 2009 at 12:24:44PM -0800, towel moist wrote:
> I tried to look for runtime query serving performance numbers to no avail.
> What is the latency range for random queries at a given rate (say 500 QPS)
> to a index built at a certain (say 5M docs) size, assuming not too many
> concurrent connections?
> 
> I am expecting sub-second but just curious whether it's more 500ms or
> 100-200ms range.

I'm not sure I can usefully answer a question like this - it will depend
significantly on the nature of the data, the nature of the queries, and
the hardware specs.

The best way to get a feel for it is to build a prototype with realistic
data and queries and see.

BTW, be careful of benchmarking with "random" queries - here's an
example where the picture is very different when you look at queries
using words from the documents vs "nonsense" queries, few of which
actually match any documents:

http://tag1consulting.com/Comparing_Xapian_and_Drupal5_Core_Search

That's a fairly extreme case, but even pulling random words from the
vocabulary won't produce representative queries and may skew results.
Ideally you want to replay query logs from actual users searching over
the same database, but unfortunately you rarely have those when you
start developing a system.

Cheers,
    Olly



More information about the Xapian-devel mailing list