[Xapian-tickets] [Xapian] #682: Collapsing when sorting by value unnecessarily slow

Xapian nobody at xapian.org
Wed Jun 13 02:45:57 BST 2018


#682: Collapsing when sorting by value unnecessarily slow
---------------------+---------------------------
 Reporter:  olly     |             Owner:  olly
     Type:  defect   |            Status:  closed
 Priority:  normal   |         Milestone:  1.5.0
Component:  Matcher  |           Version:  1.2.20
 Severity:  normal   |        Resolution:  fixed
 Keywords:           |        Blocked By:
 Blocking:           |  Operating System:  All
---------------------+---------------------------
Changes (by olly):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 That custom implementation is now in `common/heap.h`.

 I did contact Gareth successfully shortly after the last comment and he
 was going to try to test with git master, but he's not got back to us
 since.

 I had a quick look at the code on the RELEASE/1.4 branch.  Two things
 happen there each time we find a better candidate for a collapse key:

  * We linearly scan through the array the heap is in until we find the
 previous candidate (or don't find it if it has been eliminated from the
 proto-MSet already).  On git master there's another layer of indirection
 here which allows us to do this in O(1).
  * We flag the heap and in need of rebuilding next time we need to use it
 as a heap, and a call to `make_heap` needs most 3*N comparisons, and worst
 case is that we always have to rebuild.  With the custom heap
 implementation we can use `Heap::siftdown()` to replace the element while
 keeping the heap valid in at most 2*log(N) comparisons.

 Using the custom heap implementation looks fairly straightforward (I
 refitted it throughout git master recently and that wasn't hard) but that
 still leaves us with a worst case of O(m*n) where m is the requested MSet
 size and n the number of documents in the database.

 I think trying to retrofit the additional layer of indirection into the
 RELEASE/1.4 branch would be too disruptive, especially as we're a long way
 into the 1.4 stable release series.

 Changing the heap implementation would probably help, but it carries some
 risk for all users while fixing a problem which only users using
 collapsing, and not all those users even.  And it is unlikely to deliver
 the substantial change which git master should.

 So I think we don't try to backport anything for this issue.

--
Ticket URL: <https://trac.xapian.org/ticket/682#comment:5>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list