[Xapian-tickets] [Xapian] #782: Improve encode_length() performance
Xapian
nobody at xapian.org
Wed May 8 08:15:19 BST 2019
#782: Improve encode_length() performance
----------------------------+-------------------------
Reporter: Kronuz | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Backend-Remote | Version:
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
----------------------------+-------------------------
Comment (by olly):
I've done some more tests, and also now tested on x86-64 Linux.
The tl;dr version is that it looks to me like your code is probably better
with clang, but probably slower with GCC. And that's clearly problematic.
I made some changes to the benchmark code. I don't have actual data, but
I think the full range of `unsigned long long` is not a realistic test.
Most encoded values will be term lengths or wdf values, both of which I'd
expect to mostly fit in one byte. So I changed the start point to be
`0xfff`. Using `unsigned long long` is not a representative test, as
almost all the types encoded will be 32 bit by default.
I also changed the benchmark loop to what the libbenchmark docs recommend
unless you need C++98 compatibility (and they especially recommend for
benchmarked code that doesn't take very long to execute, like ours).
With that here's what I get on Linux:
{{{
$ clang++-7 -std=c++1z -pedantic -Wall -Wextra -O3 -lbenchmark
-lbenchmark_main -o benchmark-encode_length ./benchmark-encode_length.cc
&& ./benchmark-encode_length
2019-05-08 18:50:50
Running ./benchmark-encode_length
Run on (8 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_EncodeLength_Original 155 ns 155 ns 4506744
BM_EncodeLength_Optimized 113 ns 113 ns 6190602
$ g++ -std=c++1z -pedantic -Wall -Wextra -O3 -lbenchmark -lbenchmark_main
-o benchmark-encode_length ./benchmark-encode_length.cc && ./benchmark-
encode_length
2019-05-08 18:50:59
Running ./benchmark-encode_length
Run on (8 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_EncodeLength_Original 88 ns 88 ns 7929005
BM_EncodeLength_Optimized 117 ns 117 ns 5926981
}}}
So your optimised version improves with clang but regresses with GCC.
Interestingly your version seems to have pretty much the same performance
with both compilers, but the current code works much better with GCC than
clang.
BTW, this clang using libstdc++ - I tried with `-stdlib=libc++` but the
built program segfaults.
I see essentially the same picture on x86 Linux.
So where do we go from here?
I think gathering some representative data on the values which get encoded
would be useful, so we can actually make sure we're benchmarking something
realistic. In particular if the single byte case dominates that makes a
difference.
Having some real data could also inform a decision about whether the
current encoding is actually a good choice. Using the simpler format
which `pack_uint()` produces is an option for example.
Do you have a real world xapiand deployment you could log such data from?
If not, I can see if I can probably find somewhere to get suitable data
from. Or perhaps we both should to get some idea of how much variability
there is.
--
Ticket URL: <https://trac.xapian.org/ticket/782#comment:7>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list