[Xapian-devel] Optimized VSEncoding
Hurricane Tong
zhangshangtong.cpp at qq.com
Fri Mar 14 01:03:00 GMT 2014
Hi,
Thanks for your patch, it works well, the output is the same with what it is before.
To do further optimization, I try to use more subtle bit operation in encoding, it does make a little progress.
The get_optimal_split cost 72% of the time for encoding, but I can hardly apply more optimization to it.
I will try to optimize the dynamic programming algorithm.
>>We also recommend using Linux for development work rather than Windows.
I have a ubuntu installed in my machine, but in ubuntu, my machine cannot awake from hiberating, sleeping....
I tried many methods but finally failed, so if not necessary, I don't switch to ubuntu.
Best
Regards
------------------
Shangtong Zhang,Second Year Undergraduate,
School of Computer Science,
Fudan University, China.
------------------ Original ------------------
From: "Olly Betts";<olly at survex.com>;
Date: Thu, Mar 13, 2014 06:49 PM
To: "Hurricane Tong"<zhangshangtong.cpp at qq.com>;
Cc: "xapian-devel"<xapian-devel at lists.xapian.org>;
Subject: Re: [Xapian-devel] Optimized VSEncoding
On Thu, Mar 13, 2014 at 10:41:02AM +0800, Hurricane Tong wrote:
> When only encoding the first 1000lines,
> both cost 0ms to decode and VS cost 1ms to encode while Interpolative cost 0ms,
> 1000lines is just too little to catch the difference in my test.
As Richard says, there are better ways to do timing tests than calling
a fairly low resolution timer like clock() before and after a single
run.
We also recommend using Linux for development work rather than Windows.
That would help here as there are better open source profiling tools
available on Linux.
> I upload the source code to https://github.com/HurricaneTong/Xapian/tree/master/VSEncoder
Thanks.
You're inserting repeatedly at the start of vector S - each time you do
that the entire current contents of the vector have to be copied up one
place to make room, which means that inserting N items has cost O(N*N).
But vector::push_back() is O(1), so pushing back N items has cost O(N).
And we can just flip the loop which uses the items to run backwards
through S instead of forwards.
I've attached a quick patch to show what I mean. I haven't checked the
output is the same as before, so this might not be quite right - the
patch is really just to make it clearer what I'm talking about.
Making this change reduces the encoding time by a factor of about 3.6
for me for the linux.postlist case (though if the patch isn't quite
right this reduction may not be either). And that's still ~14 times as
long as interpolative coding takes, but I suspect there's further scope
for speeding it up.
Cheers,
Olly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140314/2376a902/attachment.html>
More information about the Xapian-devel
mailing list