[Xapian-discuss] redundant disk access in xapian-tcpsrv

Olly Betts olly at survex.com
Wed Jun 28 04:03:49 BST 2006


On Tue, Jun 27, 2006 at 09:07:13PM -0400, Rocco Caputo wrote:
> I noticed this while looking through an strace of xapian-tcpsrv to  
> determine the reason it's slower at uncached fetches than my stand- 
> alone Perl client.  To be fair, my stand-alone test client also opens  
> the database in non-lazy mode, so it suffers from the same  
> inefficiency that xapian-tcpsrv seems to.

This "lazy" flag isn't available at the API level - it's just there
so that the matcher can look at document values without the backend
checking that the document exists (since it must exist if we have a
posting list entry for it).

> Also, calling open_document() in lazy mode removes the redundant
> lseek() and read() calls, but this doesn't improve performance in my
> situation.

The re-read disk blocks are almost certain to be cached so the overhead
will be tiny compared to everything else going on.

The code's here is a bit odd.  If lazy is false, it reads the document
data when the document object is created, and then promptly discards it.
Looking at when this was added, it looks like this is done simply to
check that this document actually exists in the database (and throw an
exception if not).  We can definitely do that better!

Slightly odd that you're getting llseek and read from strace though.
Xapian should use pread instead if it's available, and I'd expect pread
to be a syscall.  What platform is this?  What do the PREAD related bits
of config.h say?

Cheers,
    Olly



More information about the Xapian-discuss mailing list