xapian-core 1.4.21: 20 tests fail on i686

Fri Oct 28 23:27:01 BST 2022

Olly,

Thanks for detailed explanations! I'll need to think about it more.

Regards,

On Wed, Oct 26, 2022 at 11:42:51PM +0100, Olly Betts wrote:
> On Wed, Oct 26, 2022 at 03:37:38PM +0300, Vitaly Chikunov wrote:
> > > If you insist on using --disable-sse, the simplest solution is to not
> > > run the testsuite.  (The purpose of the testsuite is to find bugs; an
> > > effect of --disable-sse is essentially to introduce bugs...)
> > 
> > I had no intention of producing buggy packages, and did not expect
> > (looking sane) configure option to intentionally introduce bugs (and not
> > just slowness).
> > JFYI, some other distros package while compiling with `--disable-sse`:
> > 
> >   OpenSUSE: https://build.opensuse.org/package/view_file/openSUSE:Factory/xapian-core/xapian-core.spec?expand=1
> >   OpenMandriva: https://github.com/OpenMandrivaAssociation/xapian-core/blob/rolling/xapian-core.spec
> 
> Sigh, I guess people don't read the documentation.  Perhaps we need to
> make `--disable-sse` emit suitably scary warnings.
> 
> > > I'm not sure what the ALT Linux baseline for i686 is, but if you really
> > > need to build binary packages which will run on processors without SSE,
> > > I'd strongly recommend the approach describing in the last entry here:
> > > 
> > > https://trac.xapian.org/wiki/PackagingXapian
> > 
> > I understand this approach (this is perhaps how Debian packages), but
> > think users of old hardware still want correctness (at least internal to
> > the library).
> 
> To be clear, Debian doesn't mandate this approach.
> 
> However, I really do think it's the right approach for packages of
> xapian-core to provide a build without these problems for the vast
> majority of x86 users who do have a CPU with SSE.  Having two builds
> is partly orthogonal to how to try to address this for CPUs without SSE,
> but it does mean solutions which give better correctness but with a
> significant runtime overhead when enabled may be more appealing
> because only those users they actually benefit end up incurring that
> overhead.  E.g. you could enable `-ffloat-store` for the non-SSE
> build.
> 
> The problem with actually fixing this is that it seems disproportionately
> hard to do, and it only benefits hardware that's almost nobody still
> uses.  It's hard to justify spending much time on this when that time
> could be spent working on things that will benefit users on all
> platforms.
> 
> Ways to try to address this I can see:
> 
> * Moving all the code that calculates weights to C, then using
>   -fexcess-precision=standard to compile that code.
>   
>   Downsides are it's a lot of work and would make the code less
>   maintainable, while introducing some runtime overhead for everyone
>   (because of the need for more cross-object-file function calls),
>   and at least some additional runtime overhead when enabled.
> 
>   Also this only solves the problem for one compiler: GCC (even clang
>   which seems to implement most GCC options doesn't implement it).
> 
> * Compile with -ffloat-store.  Unclear how much work it would be (GCC
>   manual says we'd also need to "[modify code] to store all pertinent
>   intermediate computations into variables") but it will add significant
>   runtime overhead due to having to store values to memory and reload
>   rather than keeping them in registers.
>   
>   Also doesn't solve the problem for all compilers (it seems clang
>   doesn't support `-ffloat-store` either for example).
> 
> * Add something like `VOLATILE_FOR_EXCESS_PRECISION` which is empty
>   except on platforms with excess precision where it's `volatile`
>   and add that throughout the code where weights values are
>   manipulated.
> 
>   Downsides are it's a lot of work, makes the code less readable, adds
>   likely significant runtime overhead when enabled, and it's hard to
>   know exactly where it's needed - too many places adds more overhead;
>   too few means potential bugs.  Bugs manifesting depends when values
>   are spilled by the compiler's register allocator, so many will be
>   latent with a particular compiler but then pop up with a new version
>   of that same compiler.
> 
> * Calculate all weights using integers (or a fixed-point implementation
>   which uses integers internally).  A lot of work to make things slower,
>   and either we use this everywhere (penalising most users for no
>   reason) or only when needed, in which case bugs specific to (for
>   example due to overflow or underflow) will likely linger because
>   almost nobody is using it.  Only using it when needed would also
>   mean the weights calculated would be different between the two
>   variants, so for example remote backend wouldn't work properly unless
>   all the machines involved ran the same variant.
> 
> * Calculate all weights using `float` instead of `double` (not certain
>   this avoids excess precision, but I think it does).
> 
>   Downsides are much more limited precision and range of values that can
>   be represented, and that modern CPU FP units are probably optimised
>   much more for `double`.  It seems likely to be a fair bit of work too
>   (naively it's "just s/double/float/g" but I'd bet there's a lot of
>   fallout to deal with).  Also the same issue as integers/fixed point of
>   whether to use it everywhere vs remote backend between variants.
> 
> * Calculate all weights using `long double` instead of `double`.
> 
>   Downsides are it's a larger type so additional memory use, and it's
>   implemented in software for some platforms which have hardware
>   `double` (e.g. arm64) so there's also the same issue of whether to use
>   it everywhere vs remote backend between variants.  I suspect even with
>   hardware FP it's going to be a bit slower.
> 
> * Officially hard drop support for machines with excess precision.
>   Distros with x86 baselines which don't support SSE would just patch in
>   hacks to allow it to build (quite possibly different hacks per distro)
>   so this doesn't really seem to actually help.
> 
> * Our current approach: default to enabling SSE on x86 (which requires
>   compiler-specific handling but at least it seems likely to be
>   something that any modern x86 compiler will support somehow) and
>   recommend using this, but if this is explicitly disabled provide a
>   build with no guarantees.
>   
>   We have some special handling for places where this causes serious
>   breakage (e.g. segmentation faults from the undefined behaviour due to
>   an inconsistent sort comparison I mentioned before).
> 
> There may be other approaches, but I'm doubtful there's a simple fix
> for this.
> 
> Fundamentally anything which requires significant work is unlikely to
> happen unless someone who cares a lot about ancient x86 does it.
> I'm also not keen on solutions which make things worse for platforms
> without excess precision, or which harm maintainability of the code.
> 
> > To not run testsuite to not see the bugs while looks like a solution,
> > does not looks like a correct one.
> 
> I guess we could mark the testcases that are known to fail so they are
> skipped in a build with excess precision.  As well as old x86, there's
> also old m68k (68040 and later are apparently OK), though that's even
> less relevant at this point (Debian still has an m68k port but it's
> not been part of releases for quite a while now).
> 
> It's likely more such testcases will pop up with new compiler versions
> though.
> 
> Cheers,
>     Olly