[Xapian-tickets] [Xapian] #686: Backend support for >= 2**32 documents

Xapian nobody at xapian.org
Tue Nov 17 01:16:35 GMT 2015


#686: Backend support for >= 2**32 documents
---------------------------+------------------------------
 Reporter:  olly           |             Owner:  olly
     Type:  defect         |            Status:  new
 Priority:  normal         |         Milestone:  1.3.x
Component:  Backend-Glass  |           Version:  SVN trunk
 Severity:  normal         |        Resolution:
 Keywords:                 |        Blocked By:
 Blocking:                 |  Operating System:  All
---------------------------+------------------------------

Comment (by olly):

 And I have a new encoding for `pack_uint_preserving_sort()` which extends
 up to 64 bits nicely.  For 32-bit values, it takes one fewer byte to
 encode `0x4000`-`0x7fff`, but one more byte to encode
 `0x20000000`-`0x3fffffff`.  Overall that's probably a win for most people,
 with a small penalty if you have much more than half a billion documents,
 and naturally extends us to 64 bits which seems a sane trade-off.

 The obvious tweak to the current encoding which would allow 64 bits is to
 use 3 bits for the width instead of 2, and that is always the same or
 worse than my new encoding until you have more than 144 million billion
 documents.

 The basic idea is that the first byte of the encoding is a run of 0 to 6
 `1` bits, followed by a `0` bit.  Any remaining bits store the most
 significant bits of the value.  The number of `1` bits in the run is one
 fewer than the number of bytes of value which follow (so the encoding is
 at least 2 bytes long, and at most 8 (or at most 5 for 32-bit values).

--
Ticket URL: <http://trac.xapian.org/ticket/686#comment:3>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list