[Xapian-tickets] [Xapian] #686: Backend support for >= 2**32 documents
Xapian
nobody at xapian.org
Tue Nov 17 01:16:35 GMT 2015
#686: Backend support for >= 2**32 documents
---------------------------+------------------------------
Reporter: olly | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.3.x
Component: Backend-Glass | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
---------------------------+------------------------------
Comment (by olly):
And I have a new encoding for `pack_uint_preserving_sort()` which extends
up to 64 bits nicely. For 32-bit values, it takes one fewer byte to
encode `0x4000`-`0x7fff`, but one more byte to encode
`0x20000000`-`0x3fffffff`. Overall that's probably a win for most people,
with a small penalty if you have much more than half a billion documents,
and naturally extends us to 64 bits which seems a sane trade-off.
The obvious tweak to the current encoding which would allow 64 bits is to
use 3 bits for the width instead of 2, and that is always the same or
worse than my new encoding until you have more than 144 million billion
documents.
The basic idea is that the first byte of the encoding is a run of 0 to 6
`1` bits, followed by a `0` bit. Any remaining bits store the most
significant bits of the value. The number of `1` bits in the run is one
fewer than the number of bytes of value which follow (so the encoding is
at least 2 bytes long, and at most 8 (or at most 5 for 32-bit values).
--
Ticket URL: <http://trac.xapian.org/ticket/686#comment:3>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list