[Xapian-tickets] [Xapian] #822: Honey format tweaks
Xapian
nobody at xapian.org
Wed Aug 16 04:16:59 BST 2023
#822: Honey format tweaks
----------------------------------+------------------------
Reporter: Olly Betts | Owner: Olly Betts
Type: defect | Status: new
Priority: normal | Milestone: 1.5.0
Component: Backend-Honey | Version:
Severity: normal | Keywords:
Blocked By: | Blocking:
Operating System: All |
----------------------------------+------------------------
The encoding of spelling "tail" and "bookend" term lists could be
improved.
In honey the spelling data encoding makes use of knowing that the last 2
(for tail) or 1 (for bookend) bytes are fixed and that we can know them by
looking at the key, but we still store a reuse byte for the first entry.
This could reuse up to two bytes, but usually won't save any and takes a
byte to store, so overall it costs us slightly under one byte per tail and
per bookend term list. That's less than twice the number of spelling
targets (typically significantly so since many words have the same last
two bytes / same first and last byte) so it's not a vast saving (e.g. the
largest spelling data table I have to hand is from recoll which has 494633
spelling targets but only 1617 bookends and 1802 tails, so the saving
there would be at most 3419 bytes), but supporting this also complicates
decode because it is possible for the reuse and tail to overlap (we
weren't handling this situation correctly until
99873ea22f22e8cb99d4f1db2d6591c2f725afa8) so we really should sort it out
at some point.
--
Ticket URL: <https://trac.xapian.org/ticket/822>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list