Frama-C news and ideas

To content | To menu | To search | Frama-C Home

The problem with differential testing is that at least one of the compilers must get it right

A long time ago, John Regehr wrote a blog post about a 3-3 split vote that occurred while he was finding bugs in C compilers through differential testing. John could have included Frama-C's value analysis in his set of C implementations, and then the vote would have been 4-3 for the correct interpretation (Frama-C's value analysis predicts the correct value on the particular C program that was the subject of the post). But self-congratulatory remarks are not the subject of today's post. Non-split votes in differential testing where all compilers get it wrong are.

A simple program to find double-rounding examples

The program below looks for examples of harmful double-rounding in floating-point multiplication. Harmful double-rounding occurs when the result of the multiplication of two double operands differs between the double-precision multiplication (the result is rounded directly to what fits the double format) and the extended-double multiplication (the mathematical result of multiplying two double numbers may not be representable exactly even with extended-double precision, so it is rounded to extended-double, and then rounded again to double, which changes the result).

$ cat dr.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>
#include <limits.h>

int main(){
  printf("%d %a %La\n", FLT_EVAL_METHOD, DBL_MAX, LDBL_MAX);
  while(1){
    double d1 = ((unsigned long)rand()<<32) +
                           ((unsigned long)rand()<<16) + rand() ;
    double d2 = ((unsigned long)rand()<<32) +
                           ((unsigned long)rand()<<16) + rand() ;
    long double ld1 = d1;
    long double ld2 = d2;
    
    if (d1 * d2 != (double)(ld1 * ld2))
      printf("%a*%a=%a but (double)((long double) %a * %a))=%a\n", 
	     d1, d2, d1*d2,
	     d1, d2, (double)(ld1 * ld2));
  }
}

The program is platform-dependent, but if it starts printing something like below, then a long list of double-rounding examples should immediately follow:

0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380

Results

In my case, what happened was:

$ gcc -v
Using built-in specs.
Target: i686-apple-darwin11
...
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
$ gcc -std=c99 -O2 -Wall dr.c && ./a.out 
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
^C

I immediately blamed myself for miscalculating the probability of easily finding such examples, getting a conversion wrong, or following while (1) with a semicolon. But it turned out I had not done any of those things. I turned to Clang for a second opinion:

$ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
$ clang -std=c99 -O2 -Wall dr.c && ./a.out 
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
^C

Conclusion

It became clear what had happened when looking at the assembly code:

$ clang -std=c99 -O2 -Wall -S dr.c && cat dr.s
...
	mulsd	%xmm4, %xmm5
	ucomisd	%xmm5, %xmm5
	jnp	LBB0_1
...

Clang had compiled the test for deciding whether to call printf() into if (xmm5 != xmm5) for some register xmm5.

$ gcc -std=c99 -O2 -Wall -S dr.c && cat dr.s
...
	mulsd	%xmm1, %xmm2
	ucomisd	%xmm2, %xmm2
	jnp	LBB1_1
...

And GCC had done the same. Although, to be fair, the two compilers appear to be using LLVM as back-end, so this could be the result of a single bug. But this would remove all the salt of the anecdote, so let us hope it isn't.


It is high time that someone used fuzz-testing to debug floating-point arithmetic in compilers. Hopefully one compiler will get it right sometimes and we can work from there.

Comments

1. On Wednesday, September 25 2013, 22:32 by John Regehr

My impression is that meaningful differential testing is impossible when the compiler developers say what we read in the first "non bug" here:

http://gcc.gnu.org/bugs/

It's hard to even understand their statement that rounding is a problem since IEEE has well-defined rounding.

The best I would hope to do with Csmith + GCC is try to get the compiler to crash....

2. On Wednesday, September 25 2013, 22:35 by John Regehr

Strictly speaking, the minimum requirement to do differential testing is that at least one compiler must return a different answer than the rest. It doesn't have to be correct.

3. On Wednesday, September 25 2013, 22:47 by pascal

Hello, John.

As phrased, the first non-bug is a cop-out. The programmer *may* know what results (s)he expects for the compilation platform (s)he is using. In addition, while the optimizations enabled by -ffast-math would allow the result to change, not enabling any of them (and enabling deterministic floating-point if targeting the x87; -std=c99 seems to activate this) should mean that the same results are always obtained.

My colleague Guillaume Melquiond boasts having reported several floating-point bugs in GCC, which were then fixed. He went as far as claiming that no bug was left to his knowledge (and indeed I wouldn't have been so confident if a much more recent GCC version that I had bootstrapped had not given me the list of double-rounding issues that I was after in the first place). The GCC developers may not appreciate tens of floating-point bugs reported at once, but they seem to be taking floating-point seriously.

4. On Monday, September 30 2013, 07:53 by Andrey karpov

Relevant article: 64-bit programs and floating-point calculations - www.viva64.com/en/b/0074/

5. On Monday, September 30 2013, 15:15 by pascal

Hello, Andrey.

Your article is more about the phenomenon of “excess precision” in floating-point computations in C programs. C99 compilers should in principle indicate that intermediate results are computed with a precision greater than their types would indicate by defining FLT_EVAL_METHOD to a nonzero value. The program in this post is explicitly intended to be evaluated on a platform where FLT_EVAL_METHOD is defined as zero. Or, in other words, the behavior observed with the two compilers I used (both 64-bit compilers) is incorrect—it looks like a wrongly applied Common Subexpression Elimination, where the subexpressions are not the same.

But indeed, you are right, the reason I was interested in double rounding and wrote the original program was as a part of a discussion about C99's FLT_EVAL_METHOD.

By the way, the sentence “Only addition of a value equal to or larger than FLT_EPSILON to 1.0f will increase the value of the variable: 1.0f + FLT_EPSILON !=1.0f” in your post is slightly approximative. Adding to 1.0f a value slightly larger than half FLT_EPSILON produces a result different from 1.0f in round-to-nearest-even mode.

6. On Sunday, December 8 2013, 14:52 by pascal
Since this post was published, Stephen Canon took it upon himself to report the bug in LLVM, and then to provide the patch that fixes it: http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/167711