[Xapian-tickets] [Xapian] #686: Backend support for >= 2**32 documents
Xapian
nobody at xapian.org
Tue Nov 17 21:49:19 GMT 2015
#686: Backend support for >= 2**32 documents
---------------------------+------------------------------
Reporter: olly | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.3.x
Component: Backend-Glass | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
---------------------------+------------------------------
Comment (by olly):
I wrote a script to compare the encodings.
* `now` is the current encoding, shown for comparison
* `now3` is the current encoded extended to use a 3 bit length.
* `unary` is my proposed new encoding
* the others are as described in Managing Gigabytes 2nd ed, p117 (golbx
being Golomb with b=x)
The cell values are the number of bytes needed to encode a value with that
many bits with the specified encoding.
The shortest values per row are bold, and the longest italic (it happens
every value is either the best or worst in its row). For rows where all
values are the same, bold and italic aren't used. The `now` encoding is
ignored when calculating the best and worst (since it can't represent the
full 64-bit range).
||=bits =||=now3
=||=unary=||=gamma=||=delta=||=golb3=||=golb6=||=best=||=now=||
||1-13 ||2 ||2 ||2 ||2 ||2 ||2 ||2||(2)||
||14 ||''3'' ||'''2'''||'''2'''||'''2'''||'''2'''||''3'' ||2||(2)||
||15 ||''3'' ||'''2'''||'''2'''||'''2'''||''3'' ||''3'' ||2||(3)||
||16-20 ||3 ||3 ||3 ||3 ||3 ||3 ||3||(3)||
||21 ||'''3'''||'''3'''||'''3'''||''4'' ||'''3'''||'''3'''||3||(3)||
||22 ||''4'' ||'''3'''||''4'' ||''4'' ||''4'' ||''4'' ||3||(3)||
||23-28 ||4 ||4 ||4 ||4 ||4 ||4 ||4||(4)||
||29 ||'''4'''||'''4'''||'''4'''||''5'' ||'''4'''||''5'' ||4||(4)||
||30-35 ||5 ||5 ||5 ||5 ||5 ||5 ||5||(5)||
||36 ||'''5'''||'''5'''||''6'' ||''6'' ||'''5'''||'''5'''||5||(5)||
||37 ||'''5'''||''6'' ||''6'' ||''6'' ||'''5'''||''6'' ||5||(5)||
||38-43 ||6 ||6 ||6 ||6 ||6 ||6 ||6||(5)
for 38 only||
||44 ||'''6'''||''7'' ||''7'' ||''7'' ||'''6'''||'''6'''||6||||
||45 ||'''6'''||''7'' ||''7'' ||''7'' ||''7'' ||''7'' ||6||||
||46-50 ||7 ||7 ||7 ||7 ||7 ||7 ||7||||
||51 ||'''7'''||''8'' ||'''7'''||'''7'''||'''7'''||'''7'''||7||||
||52 ||'''7'''||''8'' ||''8'' ||''8'' ||'''7'''||'''7'''||7||||
||53 ||'''7'''||''8'' ||''8'' ||''8'' ||''8'' ||''8'' ||7||||
||54-57 ||8 ||8 ||8 ||8 ||8 ||8 ||8||||
||58 ||'''8'''||''9'' ||'''8'''||'''8'''||'''8'''||'''8'''||8||||
||59 ||'''8'''||''9'' ||'''8'''||'''8'''||'''8'''||'''8'''||8||||
||60 ||'''8'''||''9'' ||''9'' ||''9'' ||'''8'''||'''8'''||8||||
||61 ||'''8'''||''9'' ||''9'' ||''9'' ||''9'' ||''9'' ||8||||
||62-64 ||9 ||9 ||9 ||9 ||9 ||9 ||9||||
No one encoding is best everywhere, but unary seems as good as any of the
other encodings in the table, and is the only one which is at least as
good as the current encoding for all values currently encoded (i.e. it
wouldn't add overhead to any database you can currently work with).
--
Ticket URL: <http://trac.xapian.org/ticket/686#comment:4>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list