[Xapian-tickets] [Xapian] #782: Improve encode_length() performance
Xapian
nobody at xapian.org
Fri Jun 7 03:25:30 BST 2019
#782: Improve encode_length() performance
----------------------------+-----------------------------
Reporter: Kronuz | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone:
Component: Backend-Remote | Version:
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
----------------------------+-----------------------------
Comment (by olly):
Tweaking the remote protocol to not send the sort slot when we aren't
sorting by value at all gives me:
{{{
Data points: 1579310
Top 20 values with frequencies:
590387 1
230343 4
149371 3
99771 0
25724 2
17371 6
13834 18
13548 8
11340 5
8428 7
8179 40
7948 61
6418 9
5941 11
5674 90
5243 21
4801 10
4376 17
3804 16
3573 15
}}}
121954 of those are > 254, which is 7.72%. The mean is now
39436.6589466286.
I looked at some real world data (334230 documents from English wikipedia
over two databases) and those have a lot more larger values.
However calculating the size difference between `encode_length()` and
`pack_uint()`, the latter is about 21% smaller for the real world data and
42% smaller for the testsuite data. For your data it would be a little
larger (since `encode_length()` is always one byte there, while
`pack_uint()` needs two for values >= 128) - I can't calculate exactly and
don't see an easy way to approximate, but given the mean, median and SD
there can't be many values >= 128.
And `pack_uint()` is faster in your tests, and mine confirm that - on
x86-64 Linux I get:
{{{
olly at gemse:~/git/xapian$ g++ -O2 -W -Wall benchmark-encode_length.3.cc
-lbenchmark -lbenchmark_main
olly at gemse:~/git/xapian$ ./a.out
2019-06-07 13:23:14
Running ./a.out
Run on (8 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_EncodeLength_Original 47 ns 47 ns 14689866
BM_EncodeLength_Optimized 50 ns 50 ns 14041593
BM_EncodeLength_Pack 23 ns 23 ns 30639009
BM_EncodeLength_Char 15 ns 15 ns 45907718
olly at gemse:~/git/xapian$ g++ -O3 -W -Wall benchmark-encode_length.3.cc
-lbenchmark -lbenchmark_main
olly at gemse:~/git/xapian$ ./a.out
2019-06-07 13:23:29
Running ./a.out
Run on (8 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_EncodeLength_Original 46 ns 46 ns 15182687
BM_EncodeLength_Optimized 47 ns 47 ns 15025481
BM_EncodeLength_Pack 23 ns 22 ns 30794134
BM_EncodeLength_Char 9 ns 9 ns 76277125
olly at gemse:~/git/xapian$ g++ --version
g++ (Debian 8.3.0-7) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
olly at gemse:~/git/xapian$ clang++-7 -O2 -W -Wall benchmark-
encode_length.3.cc -lbenchmark -lbenchmark_main
olly at gemse:~/git/xapian$ ./a.out
2019-06-07 13:24:10
Running ./a.out
Run on (8 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_EncodeLength_Original 60 ns 60 ns 11231889
BM_EncodeLength_Optimized 61 ns 61 ns 11449092
BM_EncodeLength_Pack 23 ns 23 ns 30410274
BM_EncodeLength_Char 10 ns 10 ns 70272189
olly at gemse:~/git/xapian$ clang++-7 -O3 -W -Wall benchmark-
encode_length.3.cc -lbenchmark -lbenchmark_main
olly at gemse:~/git/xapian$ ./a.out
2019-06-07 13:24:20
Running ./a.out
Run on (8 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_EncodeLength_Original 63 ns 63 ns 11215143
BM_EncodeLength_Optimized 64 ns 64 ns 11054774
BM_EncodeLength_Pack 23 ns 23 ns 29497909
BM_EncodeLength_Char 10 ns 10 ns 71177986
}}}
We can't switch to the even faster "Char" encoding, since that can't
represent values > 255.
So I think we should switch to `pack_uint` for the remote protocol. It's
faster, looks like it will usually be more compact, and it's one fewer
encoding to have to worry about.
--
Ticket URL: <https://trac.xapian.org/ticket/782#comment:13>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list