[Xapian-devel] some trouble when devising skiplist

Olly Betts olly at survex.com
Sun May 11 08:12:26 BST 2014


On Sat, May 10, 2014 at 11:12:12AM +0800, Hurricane Tong wrote:
> I was confronted with some trouble, I describe the trouble in my journal
> http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10
> And corresponding code is in my git.

How big is your encoded list + skip list?

I wonder if a skip list is not worth the complications it causes on
update.

If we optimise for the case of used document ids being mostly contiguous
(which is often the case), we could just store a list of the lengths
(with a dummy value for any gaps).  If we used a fixed-width format,
then looking up a document length is O(1) (it's effectively an array
subscripting operation), and that'll definitely be faster than
navigating via a skip list.

And that's also easy to update (the hardest case is when a new document
length exceeds the fixed size, and you need to reencode or split).

E.g. if all the doc lengths in a chunk were < 65536 (which is often
going to be true), you would need 2 bytes per document.  The current
docid delta + vint encoded document length will need at least 2 bytes
per document, plus whatever space the skip list takes up.

Cheers,
    Olly



More information about the Xapian-devel mailing list