## The problem with differential testing is that at least one of the compilers must get it right

By pascal on Wednesday, September 25 2013, 20:22 - Permalink

A long time ago, John Regehr wrote a blog post about a 3-3 split vote that occurred while he was finding bugs in C compilers through differential testing. John could have included Frama-C's value analysis in his set of C implementations, and then the vote would have been 4-3 for the correct interpretation (Frama-C's value analysis predicts the correct value on the particular C program that was the subject of the post). But self-congratulatory remarks are not the subject of today's post. Non-split votes in differential testing where all compilers get it wrong are.

## A simple program to find double-rounding examples

The program below looks for examples of harmful double-rounding in floating-point multiplication. Harmful double-rounding occurs when the result of the multiplication of two `double`

operands differs between the double-precision multiplication (the result is rounded directly to what fits the `double`

format) and the extended-double multiplication (the mathematical result of multiplying two `double`

numbers may not be representable exactly even with extended-double precision, so it is rounded to extended-double, and then rounded again to `double`

, which changes the result).

$ cat dr.c #include <stdio.h> #include <stdlib.h> #include <math.h> #include <float.h> #include <limits.h> int main(){ printf("%d %a %La\n", FLT_EVAL_METHOD, DBL_MAX, LDBL_MAX); while(1){ double d1 = ((unsigned long)rand()<<32) + ((unsigned long)rand()<<16) + rand() ; double d2 = ((unsigned long)rand()<<32) + ((unsigned long)rand()<<16) + rand() ; long double ld1 = d1; long double ld2 = d2; if (d1 * d2 != (double)(ld1 * ld2)) printf("%a*%a=%a but (double)((long double) %a * %a))=%a\n", d1, d2, d1*d2, d1, d2, (double)(ld1 * ld2)); } }

The program is platform-dependent, but if it starts printing something like below, then a long list of double-rounding examples should immediately follow:

0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380

## Results

In my case, what happened was:

$ gcc -v Using built-in specs. Target: i686-apple-darwin11 ... gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00) $ gcc -std=c99 -O2 -Wall dr.c && ./a.out 0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 ^C

I immediately blamed myself for miscalculating the probability of easily finding such examples, getting a conversion wrong, or following `while (1)`

with a semicolon. But it turned out I had not done any of those things. I turned to Clang for a second opinion:

$ clang -v Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn) Target: x86_64-apple-darwin12.4.0 Thread model: posix $ clang -std=c99 -O2 -Wall dr.c && ./a.out 0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 ^C

## Conclusion

It became clear what had happened when looking at the assembly code:

$ clang -std=c99 -O2 -Wall -S dr.c && cat dr.s ... mulsd %xmm4, %xmm5 ucomisd %xmm5, %xmm5 jnp LBB0_1 ...

Clang had compiled the test for deciding whether to call `printf()`

into `if (xmm5 != xmm5)`

for some register `xmm5`

.

$ gcc -std=c99 -O2 -Wall -S dr.c && cat dr.s ... mulsd %xmm1, %xmm2 ucomisd %xmm2, %xmm2 jnp LBB1_1 ...

And GCC had done the same. Although, to be fair, the two compilers appear to be using LLVM as back-end, so this could be the result of a single bug. But this would remove all the salt of the anecdote, so let us hope it isn't.

It is high time that someone used fuzz-testing to debug floating-point arithmetic in compilers. Hopefully one compiler will get it right sometimes and we can work from there.

## Comments

My impression is that meaningful differential testing is impossible when the compiler developers say what we read in the first "non bug" here:

http://gcc.gnu.org/bugs/

It's hard to even understand their statement that rounding is a problem since IEEE has well-defined rounding.

The best I would hope to do with Csmith + GCC is try to get the compiler to crash....

Strictly speaking, the minimum requirement to do differential testing is that at least one compiler must return a different answer than the rest. It doesn't have to be correct.

Hello, John.

As phrased, the first non-bug is a cop-out. The programmer *may* know what results (s)he expects for the compilation platform (s)he is using. In addition, while the optimizations enabled by -ffast-math would allow the result to change, not enabling any of them (and enabling deterministic floating-point if targeting the x87; -std=c99 seems to activate this) should mean that the same results are always obtained.

My colleague Guillaume Melquiond boasts having reported several floating-point bugs in GCC, which were then fixed. He went as far as claiming that no bug was left to his knowledge (and indeed I wouldn't have been so confident if a much more recent GCC version that I had bootstrapped had not given me the list of double-rounding issues that I was after in the first place). The GCC developers may not appreciate tens of floating-point bugs reported at once, but they seem to be taking floating-point seriously.

Relevant article: 64-bit programs and floating-point calculations - www.viva64.com/en/b/0074/

Hello, Andrey.

Your article is more about the phenomenon of “excess precision” in floating-point computations in C programs. C99 compilers should in principle indicate that intermediate results are computed with a precision greater than their types would indicate by defining FLT_EVAL_METHOD to a nonzero value. The program in this post is explicitly intended to be evaluated on a platform where FLT_EVAL_METHOD is defined as zero. Or, in other words, the behavior observed with the two compilers I used (both 64-bit compilers) is incorrect—it looks like a wrongly applied Common Subexpression Elimination, where the subexpressions are not the same.

But indeed, you are right, the reason I was interested in double rounding and wrote the original program was as a part of a discussion about C99's FLT_EVAL_METHOD.

By the way, the sentence “Only addition of a value equal to or larger than FLT_EPSILON to 1.0f will increase the value of the variable: 1.0f + FLT_EPSILON !=1.0f” in your post is slightly approximative. Adding to 1.0f a value slightly larger than half FLT_EPSILON produces a result different from 1.0f in round-to-nearest-even mode.