[Xapian-discuss] xapian performance

Fernando Nemec fernando.nemec at folha.com.br
Mon Dec 4 13:37:44 GMT 2006

Hi Olly,

Thanks for your reply. As I can see, there's no more easy ways to
increase performance in my particular case. So, I'm going to wait the
xapian's next version to see if the new btree manager can help me.
Thanks again!


Thursday, November 30, 2006, 10:02:51 PM, you wrote:

> On Thu, Nov 23, 2006 at 12:25:27PM -0200, Fernando Nemec wrote:
>> <!--Xapian::Query(lula)-->
>> 1 blocks read from /local/xapian/newdb/record.
>> 4369 blocks read from /local/xapian/newdb/value.

> Hmm, I don't think you mentioned you were using values.  That adds to the
> number of blocks which we need to look at, but also if you're sorting on
> a value there are some matcher optimisations which can't be used so the
> matcher will generally need to consider more documents anyway.

>>              total       used       free     shared    buffers     cached
>> Mem:       1034764    1019508      15256          0       3556     980372

> So it looks like we're getting a lot of the 1GB being used as disk
> cache, which is good.

>> == CASE 2
>> <!--Xapian::Query((presidente PHRASE 2 lula))-->
>> 1 blocks read from /local/xapian/newdb/record.
>> 3023 blocks read from /local/xapian/newdb/value.
>> 3 blocks read from /local/xapian/newdb/termlist.
>> 153036 blocks read from /local/xapian/newdb/position.
>> 380 blocks read from /local/xapian/newdb/postlist.

> But if you do the sums here: blocks are 8K by default, and we're reading
> 156443 of them, which is 1.19GB of data, or about 265MB more than we can
> have cached (actually some blocks may be read more than once in the
> above counts, so this probably a slight over-estimate).

> So depending on initial cache state, we need to read between 265MB and
> 1.19GB of data from disk, with some seeking around between reads.

> A quick tests shows my dev box can read a total 1.4GB of data from 3
> (probably mostly sequential) uncached files on a SATA2 disk in 37
> seconds, so if the disk heads have to seek around a bit, I can see
> why this query is slow.

> Short term, more RAM will help a lot as then you'll be able to have most
> of the skeleton of the position list Btree permanently cached.  And (if
> you don't have one already) a fast RAID disk setup will help reduce the
> cost of disk cache misses.

> The new B-tree manager should also improve this once I have it ready to
> merge in, as it will reduce the number of non-leaf blocks needed to
> store a give table so there's more chance we'll have the required branch
> blocks in cache.  The reduction should be particularly good for the
> position list table.

> There may also be other easy gains remaining.  I'll see if I can think
> of anything.

> Cheers,
>     Olly

Fernando Nemec
fernando.nemec at folha.com.br

More information about the Xapian-discuss mailing list