[Xapian-devel] Optimized VSEncoding
Olly Betts
olly at survex.com
Thu Mar 13 10:49:23 GMT 2014
On Thu, Mar 13, 2014 at 10:41:02AM +0800, Hurricane Tong wrote:
> When only encoding the first 1000lines,
> both cost 0ms to decode and VS cost 1ms to encode while Interpolative cost 0ms,
> 1000lines is just too little to catch the difference in my test.
As Richard says, there are better ways to do timing tests than calling
a fairly low resolution timer like clock() before and after a single
run.
We also recommend using Linux for development work rather than Windows.
That would help here as there are better open source profiling tools
available on Linux.
> I upload the source code to https://github.com/HurricaneTong/Xapian/tree/master/VSEncoder
Thanks.
You're inserting repeatedly at the start of vector S - each time you do
that the entire current contents of the vector have to be copied up one
place to make room, which means that inserting N items has cost O(N*N).
But vector::push_back() is O(1), so pushing back N items has cost O(N).
And we can just flip the loop which uses the items to run backwards
through S instead of forwards.
I've attached a quick patch to show what I mean. I haven't checked the
output is the same as before, so this might not be quite right - the
patch is really just to make it clearer what I'm talking about.
Making this change reduces the encoding time by a factor of about 3.6
for me for the linux.postlist case (though if the patch isn't quite
right this reduction may not be either). And that's still ~14 times as
long as interpolative coding takes, but I suspect there's further scope
for speeding it up.
Cheers,
Olly
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vsencoder-speedup.patch
Type: text/x-diff
Size: 811 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140313/ee297f71/attachment-0001.patch>
More information about the Xapian-devel
mailing list