[Xapian-discuss] Phrase search performance

Olly Betts olly at survex.com
Tue Feb 21 20:59:21 GMT 2006


On Tue, Feb 21, 2006 at 07:49:19AM +0100, Arjen van der Meijden wrote:
> I've split up omega's logfile in either normal queries and slow queries, 
> where the latter took more than 2 seconds. Currently there are 422850 
> normal queries (averaging to 0.06 seconds/query, 99% done within 0.5 
> second) and 6824 slow queries (about 1.6% of the total). The average of 
> those slow queries is about 17 seconds, where about 95% is done within 
> one minute. But there are a few queries taking up to 10 minutes, for 
> instance this one: "constante download 1.5 en upload 0.7".

This is a bad case because we (foolishly) currently index 1.5 as 1 and
5, and then use a phrase search at search time.  Since 1 and 5 are very
common terms, we're creating a bad case for no good reason here.  If we
indexed this as "1.5" then it'd be much faster.

> Anyway, the easiest way to improve your set up is adding RAM. 
> I'm not sure how fast your SAN is compared to some of the faster local 
> disks, but I imagine a single sata WD Raptor locally may be able to beat 
> it in terms of throughput and response times, let alone a few in raid.

Response time may be the key issue - the read pattern is essentially
random access to 8K (by default) blocks from the Btree files.

One thing to try would be to copy position.DB onto fast local storage
and symlink it into the database directory on the SAN (assuming you've
enough local storage).  If that improves search times appreciably, you
know what the current bottleneck is.

You might find using a different NFS protocol version to mount the SAN
helps to.

> If you can get 4GB of memory in your box, you'll likely see some more 
> improvements. Keep in mind that the initial query time will still be low 
> if the SAN isn't too fast.

Oh yes, beware that the first few queries from "cold" (i.e. nothing
cached) on a large database are going to be atypically slow.  But it
doesn't take many queries before you've cached the top levels of each
Btree, and response time improves.  The new Btree manager I'm working
on for flint will be able to fill branch blocks much better, so Btrees
will be shallower (for the same data) and this effect should be less
marked.

Cheers,
    Olly



More information about the Xapian-discuss mailing list