[Xapian-devel] some testing results

Olly Betts olly at survex.com
Mon Mar 10 01:06:00 GMT 2014


[There's no need to Cc me, I am definitely subscribed to xapian-devel!]

On Sun, Mar 09, 2014 at 02:55:18PM +0800, Hurricane Tong wrote:
> I do some tests on the VSEncoding and Interpolative encoding.
> I randomly generate 225284 numbers in ascending sort, and I encode
> these numbers with VSEncoder and Interpolative encoder, and compare
> the time for decoding.

Randomly generated data probably has different characteristics from
real posting lists.  To give you some real data to play with, I've
dumped out the postlist for the term "linux" from gmane's search -
there's a text file with one integer per line here:

http://search.gmane.org/~xapian/linux.postlist.gz

There are 16170833 lines, compressed size is 37MB (uncompressed is
about 139MB).

I can easily dump out data for other terms, and can include
corresponding wdfs too if they are useful.

> The size of string generated by VSEncoder is 234720,
> while the size of string generated by InterpolativeEncoder is 225854.

OK, and that's roughly what the paper suggests.

> When compiling without -o2, 371ms is needed for VSDecoder, while
> 1042ms is needed for Interpolative.

I think the speed when compiled without optimisations is close to
irrelevant.  I wouldn't bother to test that case.

> But when using -o2, VSDecoder need 20ms, InterpolativeDecoder need 13ms.

Is this using Xapian's code for the interpolative decoding?

I'd suggest also timing cases where the whole list isn't decoded (e.g.
maybe try with decoding 20%, 40%, 60% and 80% of the list), as we
often don't need to decode the whole of postlist chunk, but with
interpolative code we need to decode disproportionately more of the
encoded data for partial cases, especially when we only want a small
amount at the start.

> Besides, VSEncoder costs much more time to encode than
> InterpolativeEncoder.
> 
> So I think VSEncoding algorithm is promising, but I need to optimize
> my code...

Yes - the paper suggests you should be able to do better, and the
interpolative coder/decoder in Xapian has had some optimisation effort
put into it.

Cheers,
    Olly



More information about the Xapian-devel mailing list