[Xapian-devel] Posting list encoding improvements - pfd encoding & var len encoding comparison program

Weixian Zhou ideazwx at gmail.com
Fri Apr 20 03:51:45 BST 2012


Hi all,
I wrote a program that implement the variable length encoding and fixed
length encoding, and compares their index size and speed of search doc
length.
You can see the comparison result from the attachment snapshot.

1. The posting list is in all memory;
2. The search strategy of fixed length encoding is skipping with
exponential step (1, 2, 4, 8, ...). Once exceeds the desired doc id, back
to previous step and skip with step 1.
3. The implemented fixed length encoding uses 4 bytes as fixed length. This
is not optimal and can be further optimized in PFD.
4. The program generates uniform random doc id gap and doc len to make
posting list.

*You can access the code via my github:
https://github.com/zwxxx/pfd_simple_test*
-- 
Weixian Zhou
Department of Computer Science and Engineering
University at Buffalo, SUNY
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120419/72da0ab2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compare.png
Type: image/png
Size: 6772 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120419/72da0ab2/attachment.png>


More information about the Xapian-devel mailing list