[Xapian-devel] Optimized VSEncoding

Olly Betts olly at survex.com
Thu Mar 13 10:49:23 GMT 2014


On Thu, Mar 13, 2014 at 10:41:02AM +0800, Hurricane Tong wrote:
> When only encoding the first 1000lines, 
> both cost 0ms to decode and VS cost 1ms to encode while Interpolative cost 0ms,
> 1000lines is just too little to catch the difference in my test.

As Richard says, there are better ways to do timing tests than calling
a fairly low resolution timer like clock() before and after a single
run.

We also recommend using Linux for development work rather than Windows.
That would help here as there are better open source profiling tools
available on Linux.

> I upload the source code to https://github.com/HurricaneTong/Xapian/tree/master/VSEncoder

Thanks.

You're inserting repeatedly at the start of vector S - each time you do
that the entire current contents of the vector have to be copied up one
place to make room, which means that inserting N items has cost O(N*N).

But vector::push_back() is O(1), so pushing back N items has cost O(N).
And we can just flip the loop which uses the items to run backwards
through S instead of forwards.

I've attached a quick patch to show what I mean.  I haven't checked the
output is the same as before, so this might not be quite right - the
patch is really just to make it clearer what I'm talking about.

Making this change reduces the encoding time by a factor of about 3.6
for me for the linux.postlist case (though if the patch isn't quite
right this reduction may not be either).  And that's still ~14 times as
long as interpolative coding takes, but I suspect there's further scope
for speeding it up.

Cheers,
    Olly
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vsencoder-speedup.patch
Type: text/x-diff
Size: 811 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140313/ee297f71/attachment-0001.patch>


More information about the Xapian-devel mailing list