[Xapian-devel] Posting list encoding improvements - pfd encoding & var len encoding comparison program

Olly Betts olly at survex.com
Fri Apr 20 04:58:20 BST 2012


On Thu, Apr 19, 2012 at 11:12:32PM -0400, Weixian Zhou wrote:
> 3. The implemented fixed length encoding uses 4 bytes as fixed length.
> This is not optimal and can be further optimized in PFD.
[...]
> Fix a typo in the attachment: The search time of 100000 searches of
> variable length encoding and fixed length encoding are reversed.

That's promising I think - the fixed length is quite a bit faster, but
almost exactly twice the size with a 4 byte fixed size.

But document lengths will probably fit in 2 bytes in many situations
(and should almost never need more than 3) so even a simple per-chunk
choice of the number of byes to use will often put this about the same
in size terms it seems.

Cheers,
    Olly



More information about the Xapian-devel mailing list