[Xapian-discuss] Search performance issues and profiling/debugging

Wed Oct 24 23:09:34 BST 2007

On Wed, Oct 24, 2007 at 11:04:01PM +0200, Ron Kass wrote:

> Sorry, seems I forgot to paste the statistics for the 100 consecutive 
> runs we did on th 'no recip' search..
> Here it is
> 
>    Max      : 40.845
>    Min       : 0.973
>    Average : 1.739141414
>    StDev    : 4.161330613

Ewww, nasty. If this is Xapian's fault, there's something really crazy
going on.

Okay, some quick observations. (Without sitting down at your system
it's difficult to do more.)

 * seems to take ten cycles to settle down; this feels really high to
   me, but modulo doing a manual warm-up on your system after a
   rebuild shouldn't actually matter all that much (I do wonder what
   the VM usage for the non-leaf blocks in your b-trees comes to,
   though).

 * there is a vaguely cyclic issue; I'd be inclined to look for
   sweeper 'scripts' in the OS, something screwy in the FS or VM
   layer, or (again) maybe something in the virtualisation layer.

   Something you could try is to put a 10 second sleep between each
   run and do it again. If the period changes, it's most likely a
   timed system thing. I don't think it is, though.

 * the cycles, if that's what they are, seem to become longer,
   suggesting (to me) that the VM system is getting better at
   understanding your load.

 * you're hit by a random spike which causes most of the damage about
   halfway through the test; remove that and your sd becomes
   reasonable (although not great, because of the warm-up period;
   remove that as well and the sd becomes pretty sane).

 * I don't like the idea of ~ 1 seconds for the mset to build, but
   without knowing a lot more about your system I have no idea where
   to attack this from (except that profiling will help - so annoying
   it isn't working for you :-(

It'd be interesting to see various system usage stats against this. If
you could get something like cacti polling with a subsecond gap there
might be some interesting stuff you could learn about what's causing
the spikes. I've seen semi-regular yet initially inexplicable spikes
in other applications (under real rather than simulated load) that
generally came down to the interactions with the FS layer (which would
make sense here, since to a first approximation that's all you're
actually using in your OS during the tests).

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org