xapian-core 1.4.21: 20 tests fail on i686

Wed Oct 26 23:42:51 BST 2022

On Wed, Oct 26, 2022 at 03:37:38PM +0300, Vitaly Chikunov wrote:
> > If you insist on using --disable-sse, the simplest solution is to not
> > run the testsuite.  (The purpose of the testsuite is to find bugs; an
> > effect of --disable-sse is essentially to introduce bugs...)
> 
> I had no intention of producing buggy packages, and did not expect
> (looking sane) configure option to intentionally introduce bugs (and not
> just slowness).
> JFYI, some other distros package while compiling with `--disable-sse`:
> 
>   OpenSUSE: https://build.opensuse.org/package/view_file/openSUSE:Factory/xapian-core/xapian-core.spec?expand=1
>   OpenMandriva: https://github.com/OpenMandrivaAssociation/xapian-core/blob/rolling/xapian-core.spec

Sigh, I guess people don't read the documentation.  Perhaps we need to
make `--disable-sse` emit suitably scary warnings.

> > I'm not sure what the ALT Linux baseline for i686 is, but if you really
> > need to build binary packages which will run on processors without SSE,
> > I'd strongly recommend the approach describing in the last entry here:
> > 
> > https://trac.xapian.org/wiki/PackagingXapian
> 
> I understand this approach (this is perhaps how Debian packages), but
> think users of old hardware still want correctness (at least internal to
> the library).

To be clear, Debian doesn't mandate this approach.

However, I really do think it's the right approach for packages of
xapian-core to provide a build without these problems for the vast
majority of x86 users who do have a CPU with SSE.  Having two builds
is partly orthogonal to how to try to address this for CPUs without SSE,
but it does mean solutions which give better correctness but with a
significant runtime overhead when enabled may be more appealing
because only those users they actually benefit end up incurring that
overhead.  E.g. you could enable `-ffloat-store` for the non-SSE
build.

The problem with actually fixing this is that it seems disproportionately
hard to do, and it only benefits hardware that's almost nobody still
uses.  It's hard to justify spending much time on this when that time
could be spent working on things that will benefit users on all
platforms.

Ways to try to address this I can see:

* Moving all the code that calculates weights to C, then using
  -fexcess-precision=standard to compile that code.

  Downsides are it's a lot of work and would make the code less
  maintainable, while introducing some runtime overhead for everyone
  (because of the need for more cross-object-file function calls),
  and at least some additional runtime overhead when enabled.

  Also this only solves the problem for one compiler: GCC (even clang
  which seems to implement most GCC options doesn't implement it).

* Compile with -ffloat-store.  Unclear how much work it would be (GCC
  manual says we'd also need to "[modify code] to store all pertinent
  intermediate computations into variables") but it will add significant
  runtime overhead due to having to store values to memory and reload
  rather than keeping them in registers.

  Also doesn't solve the problem for all compilers (it seems clang
  doesn't support `-ffloat-store` either for example).

* Add something like `VOLATILE_FOR_EXCESS_PRECISION` which is empty
  except on platforms with excess precision where it's `volatile`
  and add that throughout the code where weights values are
  manipulated.

  Downsides are it's a lot of work, makes the code less readable, adds
  likely significant runtime overhead when enabled, and it's hard to
  know exactly where it's needed - too many places adds more overhead;
  too few means potential bugs.  Bugs manifesting depends when values
  are spilled by the compiler's register allocator, so many will be
  latent with a particular compiler but then pop up with a new version
  of that same compiler.

* Calculate all weights using integers (or a fixed-point implementation
  which uses integers internally).  A lot of work to make things slower,
  and either we use this everywhere (penalising most users for no
  reason) or only when needed, in which case bugs specific to (for
  example due to overflow or underflow) will likely linger because
  almost nobody is using it.  Only using it when needed would also
  mean the weights calculated would be different between the two
  variants, so for example remote backend wouldn't work properly unless
  all the machines involved ran the same variant.

* Calculate all weights using `float` instead of `double` (not certain
  this avoids excess precision, but I think it does).

  Downsides are much more limited precision and range of values that can
  be represented, and that modern CPU FP units are probably optimised
  much more for `double`.  It seems likely to be a fair bit of work too
  (naively it's "just s/double/float/g" but I'd bet there's a lot of
  fallout to deal with).  Also the same issue as integers/fixed point of
  whether to use it everywhere vs remote backend between variants.

* Calculate all weights using `long double` instead of `double`.

  Downsides are it's a larger type so additional memory use, and it's
  implemented in software for some platforms which have hardware
  `double` (e.g. arm64) so there's also the same issue of whether to use
  it everywhere vs remote backend between variants.  I suspect even with
  hardware FP it's going to be a bit slower.

* Officially hard drop support for machines with excess precision.
  Distros with x86 baselines which don't support SSE would just patch in
  hacks to allow it to build (quite possibly different hacks per distro)
  so this doesn't really seem to actually help.

* Our current approach: default to enabling SSE on x86 (which requires
  compiler-specific handling but at least it seems likely to be
  something that any modern x86 compiler will support somehow) and
  recommend using this, but if this is explicitly disabled provide a
  build with no guarantees.

  We have some special handling for places where this causes serious
  breakage (e.g. segmentation faults from the undefined behaviour due to
  an inconsistent sort comparison I mentioned before).

There may be other approaches, but I'm doubtful there's a simple fix
for this.

Fundamentally anything which requires significant work is unlikely to
happen unless someone who cares a lot about ancient x86 does it.
I'm also not keen on solutions which make things worse for platforms
without excess precision, or which harm maintainability of the code.

> To not run testsuite to not see the bugs while looks like a solution,
> does not looks like a correct one.

I guess we could mark the testcases that are known to fail so they are
skipped in a build with excess precision.  As well as old x86, there's
also old m68k (68040 and later are apparently OK), though that's even
less relevant at this point (Debian still has an m68k port but it's
not been part of releases for quite a while now).

It's likely more such testcases will pop up with new compiler versions
though.

Cheers,
    Olly