[Xapian-tickets] [Xapian] #782: Improve encode_length() performance

Fri Jun 7 03:25:30 BST 2019

#782: Improve encode_length() performance
----------------------------+-----------------------------
 Reporter:  Kronuz          |             Owner:  olly
     Type:  enhancement     |            Status:  assigned
 Priority:  normal          |         Milestone:
Component:  Backend-Remote  |           Version:
 Severity:  minor           |        Resolution:
 Keywords:                  |        Blocked By:
 Blocking:                  |  Operating System:  All
----------------------------+-----------------------------

Comment (by olly):

 Tweaking the remote protocol to not send the sort slot when we aren't
 sorting by value at all gives me:

 {{{
 Data points: 1579310
 Top 20 values with frequencies:
  590387 1
  230343 4
  149371 3
   99771 0
   25724 2
   17371 6
   13834 18
   13548 8
   11340 5
    8428 7
    8179 40
    7948 61
    6418 9
    5941 11
    5674 90
    5243 21
    4801 10
    4376 17
    3804 16
    3573 15
 }}}

 121954 of those are > 254, which is 7.72%.  The mean is now
 39436.6589466286.

 I looked at some real world data (334230 documents from English wikipedia
 over two databases) and those have a lot more larger values.

 However calculating the size difference between `encode_length()` and
 `pack_uint()`, the latter is about 21% smaller for the real world data and
 42% smaller for the testsuite data.  For your data it would be a little
 larger (since `encode_length()` is always one byte there, while
 `pack_uint()` needs two for values >= 128) - I can't calculate exactly and
 don't see an easy way to approximate, but given the mean, median and SD
 there can't be many values >= 128.

 And `pack_uint()` is faster in your tests, and mine confirm that - on
 x86-64 Linux I get:

 {{{
 olly at gemse:~/git/xapian$ g++ -O2 -W -Wall benchmark-encode_length.3.cc
 -lbenchmark -lbenchmark_main
 olly at gemse:~/git/xapian$ ./a.out
 2019-06-07 13:23:14
 Running ./a.out
 Run on (8 X 3900 MHz CPU s)
 CPU Caches:
   L1 Data 32K (x4)
   L1 Instruction 32K (x4)
   L2 Unified 256K (x4)
   L3 Unified 8192K (x1)
 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements
 may be noisy and will incur extra overhead.
 ***WARNING*** Library was built as DEBUG. Timings may be affected.
 -----------------------------------------------------------------
 Benchmark                          Time           CPU Iterations
 -----------------------------------------------------------------
 BM_EncodeLength_Original          47 ns         47 ns   14689866
 BM_EncodeLength_Optimized         50 ns         50 ns   14041593
 BM_EncodeLength_Pack              23 ns         23 ns   30639009
 BM_EncodeLength_Char              15 ns         15 ns   45907718
 olly at gemse:~/git/xapian$ g++ -O3 -W -Wall benchmark-encode_length.3.cc
 -lbenchmark -lbenchmark_main
 olly at gemse:~/git/xapian$ ./a.out
 2019-06-07 13:23:29
 Running ./a.out
 Run on (8 X 3900 MHz CPU s)
 CPU Caches:
   L1 Data 32K (x4)
   L1 Instruction 32K (x4)
   L2 Unified 256K (x4)
   L3 Unified 8192K (x1)
 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements
 may be noisy and will incur extra overhead.
 ***WARNING*** Library was built as DEBUG. Timings may be affected.
 -----------------------------------------------------------------
 Benchmark                          Time           CPU Iterations
 -----------------------------------------------------------------
 BM_EncodeLength_Original          46 ns         46 ns   15182687
 BM_EncodeLength_Optimized         47 ns         47 ns   15025481
 BM_EncodeLength_Pack              23 ns         22 ns   30794134
 BM_EncodeLength_Char               9 ns          9 ns   76277125
 olly at gemse:~/git/xapian$ g++ --version
 g++ (Debian 8.3.0-7) 8.3.0
 Copyright (C) 2018 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
 PURPOSE.

 olly at gemse:~/git/xapian$ clang++-7 -O2 -W -Wall benchmark-
 encode_length.3.cc -lbenchmark -lbenchmark_main
 olly at gemse:~/git/xapian$ ./a.out
 2019-06-07 13:24:10
 Running ./a.out
 Run on (8 X 3900 MHz CPU s)
 CPU Caches:
   L1 Data 32K (x4)
   L1 Instruction 32K (x4)
   L2 Unified 256K (x4)
   L3 Unified 8192K (x1)
 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements
 may be noisy and will incur extra overhead.
 ***WARNING*** Library was built as DEBUG. Timings may be affected.
 -----------------------------------------------------------------
 Benchmark                          Time           CPU Iterations
 -----------------------------------------------------------------
 BM_EncodeLength_Original          60 ns         60 ns   11231889
 BM_EncodeLength_Optimized         61 ns         61 ns   11449092
 BM_EncodeLength_Pack              23 ns         23 ns   30410274
 BM_EncodeLength_Char              10 ns         10 ns   70272189
 olly at gemse:~/git/xapian$ clang++-7 -O3 -W -Wall benchmark-
 encode_length.3.cc -lbenchmark -lbenchmark_main
 olly at gemse:~/git/xapian$ ./a.out
 2019-06-07 13:24:20
 Running ./a.out
 Run on (8 X 3900 MHz CPU s)
 CPU Caches:
   L1 Data 32K (x4)
   L1 Instruction 32K (x4)
   L2 Unified 256K (x4)
   L3 Unified 8192K (x1)
 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements
 may be noisy and will incur extra overhead.
 ***WARNING*** Library was built as DEBUG. Timings may be affected.
 -----------------------------------------------------------------
 Benchmark                          Time           CPU Iterations
 -----------------------------------------------------------------
 BM_EncodeLength_Original          63 ns         63 ns   11215143
 BM_EncodeLength_Optimized         64 ns         64 ns   11054774
 BM_EncodeLength_Pack              23 ns         23 ns   29497909
 BM_EncodeLength_Char              10 ns         10 ns   71177986
 }}}

 We can't switch to the even faster "Char" encoding, since that can't
 represent values > 255.

 So I think we should switch to `pack_uint` for the remote protocol.  It's
 faster, looks like it will usually be more compact, and it's one fewer
 encoding to have to worry about.

--
Ticket URL: <https://trac.xapian.org/ticket/782#comment:13>
Xapian <https://xapian.org/>
Xapian