[Xapian-tickets] [Xapian] #671: Performance issues when querying over large number of local databases (shards)

Xapian nobody at xapian.org
Tue Mar 24 02:37:09 GMT 2015

#671: Performance issues when querying over large number of local databases
 Reporter:  wgreenberg            |             Owner:  olly
     Type:  defect                |            Status:  new
 Priority:  normal                |         Milestone:
Component:  Other                 |           Version:
 Severity:  normal                |        Resolution:
 Keywords:  sharding performance  |        Blocked By:
 Blocking:                        |  Operating System:  Linux

Comment (by olly):

 I don't think this patch is doing what you think it is.

 Each table has a built-in cursor (`C`), which you're using here.  It's
 used for operations which are implemented using a cursor, but for which
 the cursor doesn't need to live on after the we return to the caller -
 this mostly just avoids having to create a temporary cursor for every such
 operation, but also has the benefit that the blocks needed may already
 have been loaded by a previous operation.

 The problem with what you're doing is that you just use whatever is in `C`
 already.  For the root block, that's fine, but once `j < level` we're
 searching for a key in whatever block of that level happens to be in the
 cursor.  Most of the time that won't be the right block, so we'll end up
 on the first or last entry the branch block, depending which side of the
 right path down the tree we are.  So (unless something else happens to be
 making sure that `C` points to the right place, you're pre-reading an
 essentially arbitrary set of blocks here for the most part.

 I guess it gives a performance boost because we will want some of the
 blocks in that arbitrary set, and so pre-reading something is better than
 pre-reading nothing - we get reads for free while other stuff is going on.

 But I think this should be calling `block_to_cursor()` while it descends
 the tree, and then instead does the pre-read instead at the lowest level
 (which might not be the leaf level necessarily, but there's not much point
 iterating after we stop reading blocks).

Ticket URL: <http://trac.xapian.org/ticket/671#comment:3>
Xapian <http://xapian.org/>

More information about the Xapian-tickets mailing list