[Xapian-devel] Posting list encoding improvements - pfd encoding & var len encoding comparison program

Weixian Zhou ideazwx at gmail.com
Fri Apr 20 04:12:32 BST 2012


Fix a typo in the attachment: The search time of 100000 searches of
variable length encoding and fixed length encoding are reversed.
For convenience, I attached all source files.


On Thu, Apr 19, 2012 at 10:51 PM, Weixian Zhou <ideazwx at gmail.com> wrote:

> Hi all,
> I wrote a program that implement the variable length encoding and fixed
> length encoding, and compares their index size and speed of search doc
> length.
> You can see the comparison result from the attachment snapshot.
>
> 1. The posting list is in all memory;
> 2. The search strategy of fixed length encoding is skipping with
> exponential step (1, 2, 4, 8, ...). Once exceeds the desired doc id, back
> to previous step and skip with step 1.
> 3. The implemented fixed length encoding uses 4 bytes as fixed length.
> This is not optimal and can be further optimized in PFD.
> 4. The program generates uniform random doc id gap and doc len to make
> posting list.
>
> *You can access the code via my github:
> https://github.com/zwxxx/pfd_simple_test*
> --
> Weixian Zhou
> Department of Computer Science and Engineering
> University at Buffalo, SUNY
>
>


-- 
Weixian Zhou
Department of Computer Science and Engineering
University at Buffalo, SUNY
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120419/c9950a69/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: src.zip
Type: application/zip
Size: 4654 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120419/c9950a69/attachment-0001.zip>


More information about the Xapian-devel mailing list