[Xapian-tickets] [Xapian] #326: Change doc length chunk encoding so skipping through a chunk is better than O(n)

Xapian nobody at xapian.org
Sun Feb 4 21:21:46 GMT 2018


#326: Change doc length chunk encoding so skipping through a chunk is better than
O(n)
---------------------------+------------------------------
 Reporter:  richard        |             Owner:  olly
     Type:  defect         |            Status:  closed
 Priority:  normal         |         Milestone:  1.5.0
Component:  Backend-Glass  |           Version:  SVN trunk
 Severity:  normal         |        Resolution:  fixed
 Keywords:                 |        Blocked By:
 Blocking:                 |  Operating System:  All
---------------------------+------------------------------
Changes (by olly):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 The new honey backend which I recently merged to master stores document
 lengths with a fixed width encoding.

 Currently it's a fixed 4 bytes per entry, which is somewhat wasteful but
 actually for glass a typical document length entry needs 3 bytes (because
 the document length values are typically >= 128 and < 16384 which takes 2
 bytes, and we use another byte to store the docid delta, which is always 0
 unless there are deleted documents.

 My plan is to allow the width to vary per chunk - not sure if to byte or
 bit granularity, or somewhere between.  Bit granularity is obvious more
 compact, but actually the doclen data is not a huge amount of data so the
 additional saving may not justify the increased complexity (and hence
 encoding and decoding time).

 But while there's scope to improve further, the issue here is now
 addressed so closing.

--
Ticket URL: <https://trac.xapian.org/ticket/326#comment:26>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list