xapian-core 1.4.21: 20 tests fail on i686
Vitaly Chikunov
vt at altlinux.org
Fri Oct 28 23:27:01 BST 2022
Olly,
Thanks for detailed explanations! I'll need to think about it more.
Regards,
On Wed, Oct 26, 2022 at 11:42:51PM +0100, Olly Betts wrote:
> On Wed, Oct 26, 2022 at 03:37:38PM +0300, Vitaly Chikunov wrote:
> > > If you insist on using --disable-sse, the simplest solution is to not
> > > run the testsuite. (The purpose of the testsuite is to find bugs; an
> > > effect of --disable-sse is essentially to introduce bugs...)
> >
> > I had no intention of producing buggy packages, and did not expect
> > (looking sane) configure option to intentionally introduce bugs (and not
> > just slowness).
> > JFYI, some other distros package while compiling with `--disable-sse`:
> >
> > OpenSUSE: https://build.opensuse.org/package/view_file/openSUSE:Factory/xapian-core/xapian-core.spec?expand=1
> > OpenMandriva: https://github.com/OpenMandrivaAssociation/xapian-core/blob/rolling/xapian-core.spec
>
> Sigh, I guess people don't read the documentation. Perhaps we need to
> make `--disable-sse` emit suitably scary warnings.
>
> > > I'm not sure what the ALT Linux baseline for i686 is, but if you really
> > > need to build binary packages which will run on processors without SSE,
> > > I'd strongly recommend the approach describing in the last entry here:
> > >
> > > https://trac.xapian.org/wiki/PackagingXapian
> >
> > I understand this approach (this is perhaps how Debian packages), but
> > think users of old hardware still want correctness (at least internal to
> > the library).
>
> To be clear, Debian doesn't mandate this approach.
>
> However, I really do think it's the right approach for packages of
> xapian-core to provide a build without these problems for the vast
> majority of x86 users who do have a CPU with SSE. Having two builds
> is partly orthogonal to how to try to address this for CPUs without SSE,
> but it does mean solutions which give better correctness but with a
> significant runtime overhead when enabled may be more appealing
> because only those users they actually benefit end up incurring that
> overhead. E.g. you could enable `-ffloat-store` for the non-SSE
> build.
>
> The problem with actually fixing this is that it seems disproportionately
> hard to do, and it only benefits hardware that's almost nobody still
> uses. It's hard to justify spending much time on this when that time
> could be spent working on things that will benefit users on all
> platforms.
>
> Ways to try to address this I can see:
>
> * Moving all the code that calculates weights to C, then using
> -fexcess-precision=standard to compile that code.
>
> Downsides are it's a lot of work and would make the code less
> maintainable, while introducing some runtime overhead for everyone
> (because of the need for more cross-object-file function calls),
> and at least some additional runtime overhead when enabled.
>
> Also this only solves the problem for one compiler: GCC (even clang
> which seems to implement most GCC options doesn't implement it).
>
> * Compile with -ffloat-store. Unclear how much work it would be (GCC
> manual says we'd also need to "[modify code] to store all pertinent
> intermediate computations into variables") but it will add significant
> runtime overhead due to having to store values to memory and reload
> rather than keeping them in registers.
>
> Also doesn't solve the problem for all compilers (it seems clang
> doesn't support `-ffloat-store` either for example).
>
> * Add something like `VOLATILE_FOR_EXCESS_PRECISION` which is empty
> except on platforms with excess precision where it's `volatile`
> and add that throughout the code where weights values are
> manipulated.
>
> Downsides are it's a lot of work, makes the code less readable, adds
> likely significant runtime overhead when enabled, and it's hard to
> know exactly where it's needed - too many places adds more overhead;
> too few means potential bugs. Bugs manifesting depends when values
> are spilled by the compiler's register allocator, so many will be
> latent with a particular compiler but then pop up with a new version
> of that same compiler.
>
> * Calculate all weights using integers (or a fixed-point implementation
> which uses integers internally). A lot of work to make things slower,
> and either we use this everywhere (penalising most users for no
> reason) or only when needed, in which case bugs specific to (for
> example due to overflow or underflow) will likely linger because
> almost nobody is using it. Only using it when needed would also
> mean the weights calculated would be different between the two
> variants, so for example remote backend wouldn't work properly unless
> all the machines involved ran the same variant.
>
> * Calculate all weights using `float` instead of `double` (not certain
> this avoids excess precision, but I think it does).
>
> Downsides are much more limited precision and range of values that can
> be represented, and that modern CPU FP units are probably optimised
> much more for `double`. It seems likely to be a fair bit of work too
> (naively it's "just s/double/float/g" but I'd bet there's a lot of
> fallout to deal with). Also the same issue as integers/fixed point of
> whether to use it everywhere vs remote backend between variants.
>
> * Calculate all weights using `long double` instead of `double`.
>
> Downsides are it's a larger type so additional memory use, and it's
> implemented in software for some platforms which have hardware
> `double` (e.g. arm64) so there's also the same issue of whether to use
> it everywhere vs remote backend between variants. I suspect even with
> hardware FP it's going to be a bit slower.
>
> * Officially hard drop support for machines with excess precision.
> Distros with x86 baselines which don't support SSE would just patch in
> hacks to allow it to build (quite possibly different hacks per distro)
> so this doesn't really seem to actually help.
>
> * Our current approach: default to enabling SSE on x86 (which requires
> compiler-specific handling but at least it seems likely to be
> something that any modern x86 compiler will support somehow) and
> recommend using this, but if this is explicitly disabled provide a
> build with no guarantees.
>
> We have some special handling for places where this causes serious
> breakage (e.g. segmentation faults from the undefined behaviour due to
> an inconsistent sort comparison I mentioned before).
>
> There may be other approaches, but I'm doubtful there's a simple fix
> for this.
>
> Fundamentally anything which requires significant work is unlikely to
> happen unless someone who cares a lot about ancient x86 does it.
> I'm also not keen on solutions which make things worse for platforms
> without excess precision, or which harm maintainability of the code.
>
> > To not run testsuite to not see the bugs while looks like a solution,
> > does not looks like a correct one.
>
> I guess we could mark the testcases that are known to fail so they are
> skipped in a build with excess precision. As well as old x86, there's
> also old m68k (68040 and later are apparently OK), though that's even
> less relevant at this point (Debian still has an m68k port but it's
> not been part of releases for quite a while now).
>
> It's likely more such testcases will pop up with new compiler versions
> though.
>
> Cheers,
> Olly
More information about the Xapian-devel
mailing list