Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Wednesday, October 9 2013

The overflow when converting from float to integer is undefined behavior

Integer overflows in C

A previous post on this blog was a reminder that in C, signed integer arithmetic overflow is undefined behavior. In contrast, the behavior of overflows in conversions from integer type to signed integer type is implementation-defined. The C99 standard allows for an implementation-defined signal to be raised, but in practice, the widespread compilation platforms provide two's complement behavior. And you can trust that they will continue to do so, because it's implementation-defined. Compiler makers cannot change their mind willy-nilly as if it was undefined behavior:

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Floating-point overflows in C

The C standard does not mandate IEEE 754 floating-point arithmetic. Still, in practice, modern compilation platforms, if they provide floating-point features at all, provide either exactly IEEE 754 binary32 and binary64 formats and computations, or the same formats and a close approximation of the same computations.

IEEE 754 floating-point defines +inf and -inf values, so that any real number can be approximated in the target IEEE 754 format (albeit, when it ends up represented as an infinity, not precisely). This means that for C compilation platforms that implement IEEE 754 for floating-point, the condition “the value can be represented in the new type” is always true. There is no reason to worry of undefined behavior caused by overflow in either floating-point arithmetic or in the conversion of a double to a float.

Or indeed, in a constant. Consider GCC's warning here:

$ cat t.c
#include <stdio.h>

int main()
{
  double big = 0x1.0p5000;
  printf("%f\n", big);
}

$ gcc-172652/bin/gcc -std=c99 -Wall t.c && ./a.out 
t.c: In function ‘main’:
t.c:5:3: warning: floating constant exceeds range of ‘double’ [-Woverflow]
inf

The number 2^5000, represented in C as 0x1.0p5000, is totally in the range of double, which goes up to inf. Clang similarly warns that “magnitude of floating-point constant too large for type double”. A proper warning message would be that 2^5000 cannot be represented precisely, instead of implying that it cannot be represented at all.

Floating-point ↔ integer conversion overflows in C

But enough pedantry contests with compilers. The range of floating-point representations being what it is, we are left with only overflows in conversions from floating-point to integer to consider.


Suspense… (for the reader who did not pay attention to the title)


Overflows in conversions from floating-point to integer are undefined behavior. Clause 6.3.1.4 in the C99 standard make them so:

6.3.1.4 Real floating and integer

1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

What can happen in practice when a C program invokes this particular flavor of undefined behavior? It is as bad as dereferencing an invalid address, mostly harmless like signed integer arithmetic overflow, or what? Let us find out.

The following program converts the double representation of 2^31, the smallest positive integer that does not fit a 32-bit int, to int.

int printf(const char *, ...);

int main()
{
  int i = 0x1.0p31;
  printf("%d\n", i);
}

Frama-C's value analysis warns about undefined behavior in this program:

$ frama-c -val t.c

warning: overflow in conversion of 0x1.0p31 (2147483648.) 
   from floating-point to integer.
   assert -2147483649 < 0x1.0p31 < 2147483648;

Fine-tuning the assertion -2147483649 < 0x1.0p31 < 2147483648 was a riot, by the way. Do you see why?

My aging (but still valiant) PowerPC-based Mac appears to think that saturation is the way to go: the variable i is set to INT_MAX:

$ gcc -std=c99 t.c && ./a.out 
2147483647

Dillon Pariente was first to draw our attention to overflow in floating-point-to-integer conversions, which caused CPU exceptions on the target CPU for the code he was analyzing. I understood that target CPU to also be a PowerPC, so I suspect the behavior must be configurable on that architecture.

Dillon Pariente's example was along the lines of float f = INT_MAX; int i = f; which is also hilarious if you are into that sort of humor.


In order to really show how weird things can get on Intel processors, I need to modify the test program a bit:

int printf(const char *, ...);

volatile double v = 0;

int main()
{
  int i1 = 0x1.0p31;
  int i2 = 0x1.0p31 + v;
  printf("%d %d\n", i1, i2);
}

The volatile type qualifier precludes optimization, but there is no hardware or thread to change the value of variable v. The two expressions 0x1.0p31 and 0x1.0p31 + v are both expressions of type double that evaluate to 2^31.

Still GCC and Clang, like a single compiler, think that these two expressions needn't result in the same value when converted to int:

$ gcc t.c && ./a.out 
2147483647 -2147483648
$ clang  t.c && ./a.out 
2147483647 -2147483648

The results are different because one conversion was evaluated statically to be placed in %esi (2147483647) whereas the other was evaluated at run-time in %edx with the cvttsd2si instruction:

$ clang -S -O t.c  && cat t.s
...
_main:                                  ## @main
...
	movsd	_v(%rip), %xmm0
	addsd	LCPI0_0(%rip), %xmm0
	cvttsd2si	%xmm0, %edx
	leaq	L_.str(%rip), %rdi
	movl	$2147483647, %esi       ## imm = 0x7FFFFFFF
	xorb	%al, %al
	callq	_printf
...
L_.str:                                 ## @.str
	.asciz	 "%d %d\n"

Only undefined behavior allows GCC and Clang to produce different values for i1 and i2 here: the values of these two variables are computed by applying the same conversion to the same original double number, and should be identical if the program was defined.


Generally speaking, cvttsd2si always produces -0x80000000 in cases of overflow. That is almost like saturation, except that floating-point numbers that are too positive are wrapped to INT_MIN. One may think of it as saturating to either -0x80000000 or 0x80000000, and in the latter case, wrapping around to -0x80000000 because of two's complement. I do not know whether this rationale bears any resemblance to the one Intel's engineers used to justify their choice.


So one might think that this is the end of the story: as long as the conversion is done at run-time on an Intel platform, the compiler uses the cvttsd2si instruction. Overflows, if overflows there are, “saturate to INT_MIN” as the convention is on this platform. This can be confirmed experimentally with the following program variant:

#include <stdio.h>
#include <stdlib.h>

int main(int c, char **v)
{
  int i = 0x1.0p31 + strtod(v[1], 0);
  printf("%d\n", i);
}

This new program takes a number from the command-line and adds it to 2^31, so that there is no opportunity for compile-time evaluation. We expect the conversion to saturate to INT_MIN, and it does:

$ gcc -std=c99 t.c && ./a.out 1234 && ./a.out 12345 && ./a.out 123456
-2147483648
-2147483648
-2147483648


Wait! It gets more amusing still. Let us change the program imperceptibly:

int main(int c, char **v)
{
  unsigned int i = 0x1.0p32 + strtod(v[1], 0);
  printf("%u\n", i);
}

The behavior of run-time overflow in the conversion from double to integer changes completely:

$ gcc -m64 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999 
1234
123456
3755744407

But conversion saturates again, at zero this time, for the same program, when targeting IA-32:

$ gcc -m32 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999
0
0
0

Do you have an explanation for this one? Leave a message in the comments section below. The fastest author of a complete explanation wins a static analyzer license.

Conclusion

In conclusion, the overflow in the conversion from floating-point to integer is rather on the nasty side of C's undefined behavior spectrum. It may appear to behave consistently if the compilation targets an architecture where the underlying assembly instruction(s) saturate. Saturation is the behavior that compilers GCC and Clang implement when they are able to evaluate the conversion at compile-time. In these conditions, a lucky programmer may not actually observe anything strange.

The idiosyncrasies of other architectures may lead to very different results for overflowing conversions depending on parameters outside the programmer's control (constant propagation, for instance, is more or less efficient depending on the optimization level and may be difficult to predict, as we already complained about when discussing Clang targeting the 387 FPU).


Acknowledgements: In addition to Dillon Pariente, I discussed this topic with Boris Yakobowski, John Regehr, Stephen Canon, and StackOverflow users tenos, Sander De Dycker and Mike Seymour prior to writing this blog post.

Wednesday, September 25 2013

The problem with differential testing is that at least one of the compilers must get it right

A long time ago, John Regehr wrote a blog post about a 3-3 split vote that occurred while he was finding bugs in C compilers through differential testing. John could have included Frama-C's value analysis in his set of C implementations, and then the vote would have been 4-3 for the correct interpretation (Frama-C's value analysis predicts the correct value on the particular C program that was the subject of the post). But self-congratulatory remarks are not the subject of today's post. Non-split votes in differential testing where all compilers get it wrong are.

A simple program to find double-rounding examples

The program below looks for examples of harmful double-rounding in floating-point multiplication. Harmful double-rounding occurs when the result of the multiplication of two double operands differs between the double-precision multiplication (the result is rounded directly to what fits the double format) and the extended-double multiplication (the mathematical result of multiplying two double numbers may not be representable exactly even with extended-double precision, so it is rounded to extended-double, and then rounded again to double, which changes the result).

$ cat dr.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>
#include <limits.h>

int main(){
  printf("%d %a %La\n", FLT_EVAL_METHOD, DBL_MAX, LDBL_MAX);
  while(1){
    double d1 = ((unsigned long)rand()<<32) +
                           ((unsigned long)rand()<<16) + rand() ;
    double d2 = ((unsigned long)rand()<<32) +
                           ((unsigned long)rand()<<16) + rand() ;
    long double ld1 = d1;
    long double ld2 = d2;
    
    if (d1 * d2 != (double)(ld1 * ld2))
      printf("%a*%a=%a but (double)((long double) %a * %a))=%a\n", 
	     d1, d2, d1*d2,
	     d1, d2, (double)(ld1 * ld2));
  }
}

The program is platform-dependent, but if it starts printing something like below, then a long list of double-rounding examples should immediately follow:

0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380

Results

In my case, what happened was:

$ gcc -v
Using built-in specs.
Target: i686-apple-darwin11
...
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
$ gcc -std=c99 -O2 -Wall dr.c && ./a.out 
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
^C

I immediately blamed myself for miscalculating the probability of easily finding such examples, getting a conversion wrong, or following while (1) with a semicolon. But it turned out I had not done any of those things. I turned to Clang for a second opinion:

$ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
$ clang -std=c99 -O2 -Wall dr.c && ./a.out 
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
^C

Conclusion

It became clear what had happened when looking at the assembly code:

$ clang -std=c99 -O2 -Wall -S dr.c && cat dr.s
...
	mulsd	%xmm4, %xmm5
	ucomisd	%xmm5, %xmm5
	jnp	LBB0_1
...

Clang had compiled the test for deciding whether to call printf() into if (xmm5 != xmm5) for some register xmm5.

$ gcc -std=c99 -O2 -Wall -S dr.c && cat dr.s
...
	mulsd	%xmm1, %xmm2
	ucomisd	%xmm2, %xmm2
	jnp	LBB1_1
...

And GCC had done the same. Although, to be fair, the two compilers appear to be using LLVM as back-end, so this could be the result of a single bug. But this would remove all the salt of the anecdote, so let us hope it isn't.


It is high time that someone used fuzz-testing to debug floating-point arithmetic in compilers. Hopefully one compiler will get it right sometimes and we can work from there.

Monday, September 2 2013

The case for formal verification of existing software

Perry E. Metzger takes a look at formal verification. This is good stuff; there is a lot to agree with here.

However, agreeing with Perry's post alone would not make a very interesting counterpoint. If agreeing was the only thing I intended to do, I might even not have written this post. Instead I intended to add, and I apologize in advance for the predictability of my views, that while creating formally verified software from scratch is useful, verifying existing software is useful too.

Yes, formally verified software written from scratch can now be large enough in scope to be a compiler or a microkernel, but when verifying existing software, we can tackle the problem from the other end: we can pick any useful software component, and verify that. We can pick a software component so useful that it is already used by millions. If we succeed in verifying it, we have put formally verified software in the hands of millions of satisfied users. Transparently.


Take the example of the SSL implementation I am taking a couple of weeks to finish massaging through Frama-C. It is not as wide in scope as Quark, seL4 or CompCert. Neither am I aiming for the same kind of functional correctness as these projects are: I am only verifying the absence of undefined behaviors in the component, and verifying the functional dependencies of the cryptographic primitives(*).

But PolarSSL is useful. Plus, by its very nature, it is exposed to security attacks (SSL is the protocol behind HTTPS). And the former three examples are full-blown research projects, in contrast to my single person.month effort.

The bonus is that the formally verified PolarSSL can be compiled and embedded with all the same compilers and into all the same firmwares as the earlier non-verified version. It is the same as the non-verified version except maybe for a couple of bugfixes, and the confidence that for an identified usage pattern, no more undefined behavior bugs will ever need to be fixed.

All in all, the formal verification of existing code, despite its differences from its “from scratch” counterpart, has too become a force to be reckoned with.


(*) translation: I use Frama-C option -deps and I compare the result to what I expected


Acknowledgement: I got the link to Perry E. Metzger's post through Toby Murray.

Saturday, August 24 2013

Function pointers in C

This post contains a complete list of everything a C program can do with a function pointer, for a rather reasonable definition of “do”. Examples of things not to do with a function pointer are also provided. That list, in contrast, is in no way exhaustive.

What a C program can do with a function pointer

Convert it to a different function pointer type

A function pointer can be converted to a different function pointer type. The C99 standard's clause 6.3.2.3:8 starts:

“A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer.”

Call the pointed function with the original type

Clause 6.3.2.3:8 continues:

“If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.”

Alright, so the above title is slightly sensationalistic: the pointed function can be called with a compatible type. After typedef int t;, the types t and int are compatible, and so are t (*)(t) and int (*)(int), the types of functions taking a t and returning a t and of functions taking an int and returning an int, respectively.

There is no third thing a C program can do with a function pointer

Seriously. The C99 standard has uintptr_t, a recommended integer type to convert data pointers to, but there is not even an equivalent integer type to store function pointers.

What a C program cannot do with a function pointer

Convert it to an ordinary pointer

Function pointers should not be converted to char * or void *, both of which are intended for pointers to data (“objects” in the vocabulary of the C standard). Historically, there has been plenty of reasons why pointers to functions and pointers to data might not have the same representation. With 64-bit architectures, the same reasons continue to apply nowadays.

Call the pointed function with an incompatible type

Even if you know that type float is 32-bit, the same as int on your platform, the following is undefined:

void f(int x);

int main(){
  void (*p)(float) = f;
  (*p)(3);
}

The line void (*p)(float) = f;, which defines a variable p of type “pointer to function that takes a float”, and initializes it with the conversion of f, is legal as per 6.3.2.3:8. However, the following statement, (*p)(3); is actually equivalent to (*p)((float)3);, because the type of p is used to decide how to convert the argument prior to the call, and it is undefined because p points to a function that requires an int as argument.


Even if you know that the types int and long are both 32-bit and virtually indistinguishable on your platform (you may be using an ILP32 or an IL32P64 platform), the types int and long are not compatible. Josh Haberman has written a nice essay on this precise topic.

Consider the program:

void f(int x);

int main(){
  void (*p)(long) = f;
  (*p)(3);
}

This time the statement is equivalent to (*p)((long)3);, and it is undefined, even if long and int are both 32-bit (substitute long and long long if you have a typical I32LP64 platform).


Lastly, the example that prompted this post was, in a bit of Open-Source code, the creation of a new execution thread. The example can be simplified into:

void apply(void (*f)(void*), void *arg)
{
  f(arg);
}

void fun(int *x){
  // work work
  *x = 1;
}

int data;

int main(){
  apply(fun, &data);
}

The undefined behavior is not visible: it takes place inside function apply(), which is a standard library function (it was pthread_create() in the original example). But it is there: the function apply() expects a pointer to function that takes a void* and applies it as such. The types int * and void * are not compatible, and neither are the types of functions that take these arguments.

Note that gcc -Wall warns about the conversion when passing fun to apply():

t.c:11: warning: passing argument 1 of ‘apply’ from incompatible pointer type

Fixing this warning with a cast to void (*)(void*) is a programmer mistake. The bug indicated by the warning is that there is a risk that fun() will be applied with the wrong type, and this warning is justified here, since fun() will be applied with the wrong type inside function apply(). If we “fix” the program this way:

$ tail -3 t.c
int main(){
  apply((void (*)(void*))fun, &data);
}
$ gcc -std=c99 -Wall t.c
$ 

The explicit cast to (void (*)(void*) silences the compiler, but the bug is still in the same place, in function apply().


Fortunately gcc -std=c99 -Wall is not the only static analyzer we can rely on. Frama-C's value analysis warns where the problem really is, in function apply(), and it warns for both the version with implicit conversion and the version with explicit cast:

$ frama-c -val t.c
…
[value] computing for function apply <- main.
        Called from t.c:14.
t.c:3:[value] warning: Function pointer and pointed function 'fun'  have incompatible types:
        void (void *) vs. void (int *x). assert(function type matches)

The correct way to use function apply() without changing it is to make a function with the correct type for it, and to pass that function to apply():

void stub(void *x){
  fun(x);  
}
…
  apply(stub, &data);

Note that in the above, x is implicitly converted when passed to function fun(), the same way that &data is implicitly converted to void* when passed to apply().

Conclusion

There is almost nothing you can do in C with a function pointer. The feature is still very useful and instills a bit of genericity in an otherwise decidedly low-level language.

Function pointers are not often used in the standard library, considering: qsort() is, with pthread_create(), another of the few functions that requires a function pointer. Like it, it is often misused: it has its own entry in the C FAQ.


Jens Gustedt provided advice in the writing of this post.

Monday, August 5 2013

Exact case management in floating-point library functions

The documentation of the correctly-rounded CRlibm floating-point library states, for the difficult pow() function (p. 159):

Directed rounding requires additional work, in particular in subnormal handling and in exact case management. There are more exact cases in directed rounding modes, therefore the performance should also be inferior.

The phrase “exact case” refers to inputs that need to be treated specially because no number of “Ziv iterations”, at increasing precisions, can ever resolve which way the rounding should go.


Quiz: Isn't an exact case an exact case independently of the rounding mode? How can exact cases vary with the rounding mode?


If you can answer the above quiz without having to rubber duck through the entire question on an internet forum, you have me beat.

Wednesday, July 31 2013

From Pascal strings to Python tuples

Quiz time

What does the program below do?

#include <stdio.h>

int main(){
  struct {
    int t[4];
    int u;
  } v;
  v.u = 3;
  v.t[4] = 4;
  printf("v.u=%d", v.u);
  return 0;
}

Two answers are “it prints v.u=4” and “it prints v.u=3”:

$ gcc t.c && ./a.out 
v.u=4
$ gcc -O2 t.c && ./a.out 
v.u=3

The correct answer is of course that the program invokes undefined behavior. It is not that we are using at any time an lvalue of the wrong type to access memory, breaking the so-called “strict aliasing rules”. It is not that v.t+4 is outside of object v. The problem is that v.t+4 is outside object v.t. So GCC does what it pleases, and when compiling with -O2, optimizes brutally:

$ gcc -S -O2 t.c && cat t.s
.LC0:
	.string	"v.u=%d\n"
…
	movl 	$3, %edx
	movl 	$.LC0, %esi
	movl	 $1, %edi
	xorl	%eax, %eax
	call	__printf_chk

Frama-C's value analysis warns for the above program:

$ frama-c -val t.c
t.c:9:[kernel] warning: accessing out of bounds index {4}. assert 4 < 4;

In general, accessing t[i] when t is an array of size 4 is only valid when i < 4, but here the index is hard-coded as 4, so line 9 is only valid when 4 < 4. That is, never: all executions that reach line 9 encounter undefined behavior there.

Second quiz, same as the first quiz

What does the program below do?

#include "stdlib.h"

typedef struct{
  int tab[1];
} ts;

int main() {
  ts *q = malloc(5*sizeof(int));
  q->tab[2]= 5;
  return 1;
}

If you guessed “invoke undefined behavior”, well done!


The program above was shown to me by facetious colleague Bernard Botella, who is hard at work analyzing Python 2.7.4's runtime in the context of a project named SafePython. The snippet above is his reduced version of a larger piece of C code he found there. The issue Bernard was having started with the type definition below, and I will let you guess the rest:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[1];

   /* ob_item contains space for 'ob_size' elements.
    * Items must normally not be NULL, except during construction when
    * the tuple is not yet visible outside the function that builds it.
    */
} PyTupleObject;

In C90, the “array of size 1 as last member of a struct” was a common idiom for implementing things like Pascal strings. And of course it is just as valid for variable-length tuples. The problem is that this is not 1990 any more: compilers now use undefined behavior as an excuse to optimize aggressively, and the idiom is no longer valid at all for either tuples or Pascal strings. On the plus side, in the C99 standard we got “incomplete types”, a safe way to implement tuples and Pascal strings:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[];
…

Conclusion

I have encouraged my colleague Bernard to report the above as a bug in Python. This kind of bug report is usually ignored, because it denounces idioms that programmers have used for a long time and that they think they understand. Just remember: if you think you can predict what the program in the second quiz does, you should be able to predict what the program in the first quiz does (or explain what is different about it).

Wednesday, July 24 2013

More on the precise analysis of C programs for FLT_EVAL_METHOD==2

Introduction

It started innocently enough. My colleagues were talking of supporting target compilers with excess floating-point precision. We saw that if analyzing programs destined to be compiled with strict IEEE 754 compilers was a lovely Spring day at the beach, analyzing for compilers that allow excess precision was Normandy in June, 1944. But we had not seen anything, yet.

The notion of compile-time computation depends on the optimization level

One first obvious problem was that of constant expressions that were evaluated at compile-time following rules that differed from run-time ones. And who is to say what is evaluated at compile-time and at run-time? Why, it even depends, for one same compiler, on the optimization level:

#include <stdio.h>

int r1, r2;

double ten = 10.0;

int main(int c, char **v)
{
  ten = 10.0;
  r1 = 0.1 == (1.0 / ten);
  r2 = 0.1 == (1.0 / 10.0);
  printf("r1=%d r2=%d\n", r1, r2);
}

Note how, compared to last time, we make the vicious one-line change of assigning variable ten again inside function main().

$ gcc -v
Target: x86_64-linux-gnu
…
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) 
$ gcc -mno-sse2 -mfpmath=387 -std=c99 -O2  s.c && ./a.out 
r1=1 r2=1
$ gcc -mno-sse2 -mfpmath=387 -std=c99  s.c && ./a.out 
r1=0 r2=1

So the problem is not just that the static analyzer must be able to recognize the computations that are done at compile-time. A precise static analyzer that went this path would in addition have to model each of the tens of optimization flags of the target compiler and their effects on the definition of constant expression.


Fortunately for us, after many, varied complaints from GCC users, Joseph S. Myers decided that 387 floating-point math in GCC was at least going to be predictable. That would not solve all the issues that had been marked as duplicates of the infamous bug 323 over its lifetime, but it would answer the valid ones.

A ray of hope

Joseph S. Myers provided a reasonable interpretation of the effects of FLT_EVAL_METHOD in the C99 standard. The comparatively old compiler we used in the previous post and in the first section of this one does not contain the patch from that discussion, but recent compilers do. The most recent GCC I have available is SVN snapshot 172652 from 2011. It includes the patch. With this version of GCC, we compile and execute the test program below.

#include <stdio.h>

int r1, r2, r3, r4, r5, r6, r7;

double ten = 10.0;

int main(int c, char **v)
{
  r1 = 0.1 == (1.0 / ten);
  r2 = 0.1 == (1.0 / 10.0);
  r3 = 0.1 == (double) (1.0 / ten);
  r4 = 0.1 == (double) (1.0 / 10.0);
  ten = 10.0;
  r5 = 0.1 == (1.0 / ten);
  r6 = 0.1 == (double) (1.0 / ten);
  r7 = ((double) 0.1) == (1.0 / 10.0);
  printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d\n", r1, r2, r3, r4, r5, r6, r7);
}

We obtain the following results, different from the results of the earlier version of GCC, but independent of the optimization level and understandable (all computations are done with FLT_EVAL_METHOD==2 semantics):

$ ./gcc-172652/bin/gcc -mno-sse2 -mfpmath=387 -std=c99  t.c && ./a.out 
r1=1 r2=1 r3=0 r4=0 r5=1 r6=0 r7=0

As per the C99 standard, the choice was made to give the literal 0.1 the value of 0.1L. I am happy to report that this simple explanation for the values of r2 and r7 can be inferred directly from the assembly code. Indeed, the corresponding constant is declared in assembly as:

.LC1:
	.long	3435973837
	.long	3435973836
	.long	16379
	.long	0


Quiz: why is it obvious in the above assembly code for a long double constant that the compiler used the long double approximation for 0.1 instead of the double one?


As described, the semantics of C programs compiled with FLT_EVAL_METHOD==2 are just as deterministic as if they were compiled with FLT_EVAL_METHOD==0. They give different results from the latter, but always the same ones, regardless of optimization level, interference from unrelated statements, and even regardless of the particular compiler generating code with FLT_EVAL_METHOD==2. In the discussion that followed between Joseph Myers and Ian Lance Taylor, this is called “predictable semantics” and it is a boon to anyone who whishes to tell what a program ought to do when executed (including but not limited to precise static analyzers).

Implementation detail: source-to-source transformation or architecture option?

Now that at least one C compiler can be said to have predictable behavior with respect to excess precision, the question arises of supporting FLT_EVAL_METHOD==2 in Frama-C. This could be one more of the architecture-dependent parameters such as the size of type int and the endianness.

The rules are subtle, however, and rather than letting each Frama-C plug-in implement them and get them slightly wrong, it would be less error-prone to implement these rules once and for all as a source-to-source translation from a program with FLT_EVAL_METHOD==2 semantics to a program that when compiled or analyzed with FLT_EVAL_METHOD==0 semantics, computes the same thing as the first one.

The destination of the transformation can be a Frama-C AST

A translated program giving, when compiled with strict IEEE 754 semantics, the FLT_EVAL_METHOD==2 semantics of an existing program can be represented as an AST in Frama-C. Here is how the translation would work on an example:

double interpol(double u1, double u2, double u3)
{ 
  return u2 * (1.0 - u1) + u1 * u3;  
}

Function interpol() above can be compiled with either FLT_EVAL_METHOD==0 or with FLT_EVAL_METHOD==2. In the second case, it actually appears to have slightly better properties than in the first case, but the differences are minor.

A source-to-source translation could transform the function into that below:

double interpol_80(double u1, double u2, double u3)
{ 
  return u2 * (1.0L - (long double)u1) + u1 * (long double)u3;  
}

This transformed function, interpol_80(), when compiled or analyzed with FLT_EVAL_METHOD==0, behaves exactly identical to function interpol() compiled or analyzed with FLT_EVAL_METHOD==2. I made an effort here to insert only the minimum number of explicit conversions but a Frama-C transformation plug-in would not need to be so punctilious.

The source of the transformation cannot be a Frama-C AST

There is however a problem with the implementation of the transformation as a traditional Frama-C transformation plug-in. It turns out that the translation cannot use the normalized Frama-C AST as source. Indeed, if we use a Frama-C command to print the AST of the previous example in textual form:

~ $ frama-c -print -kernel-debug 1 t.c
…
/* Generated by Frama-C */
int main(int c, char **v)
{
  /* Locals: __retres */
  int __retres;
  /* sid:18 */
  r1 = 0.1 == 1.0 / ten;
  /* sid:19 */
  r2 = 0.1 == 1.0 / 10.0;
  /* sid:20 */
  r3 = 0.1 == 1.0 / ten;
  /* sid:21 */
  r4 = 0.1 == 1.0 / 10.0;
  …
}

Explicit casts to a type that an expression already has, such as the casts to double in the assignments to variables r3 and r4, are erased by the Frama-C front-end as part of its normalization. For us, this will not do: these casts, although they convert a double expression to double, change the meaning of the program, as shown by the differences between the values of r1 and r3 and respectively or r2 and r4 when one executes our example.

This setback would not be insurmountable but it means complications. It also implies that FLT_EVAL_METHOD==2 semantics cannot be implemented by individual plug-ins, which looked like a possible alternative.


To conclude this section on a positive note, if the goal is to analyze a C program destined to be compiled to the thirty-year old 8087 instructions with a recent GCC compiler, we can build the version of Frama-C that will produce results precise to the last bit. The amount of work is not inconsiderable, but it is possible.

But wait!

But what about a recent version of Clang? Let us see, using the same C program as previously:

#include <stdio.h>

int r1, r2, r3, r4, r5, r6, r7;

double ten = 10.0;

int main(int c, char **v)
{
  r1 = 0.1 == (1.0 / ten);
  r2 = 0.1 == (1.0 / 10.0);
  r3 = 0.1 == (double) (1.0 / ten);
  r4 = 0.1 == (double) (1.0 / 10.0);
  ten = 10.0;
  r5 = 0.1 == (1.0 / ten);
  r6 = 0.1 == (double) (1.0 / ten);
  r7 = ((double) 0.1) == (1.0 / 10.0);
  printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d\n", r1, r2, r3, r4, r5, r6, r7);
}
$ clang -v
Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)
$ clang -mno-sse2 -std=c99  t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=0 r7=1

Oh no! Everything is to be done again… Some expressions are evaluated as compile-time with different results than the run-time ones, as shown by the difference between r1 and r2. The explicit cast to double does not seem to have an effect for r3 and r4 as compared to r1 and r2. This is different from Joseph Myers's interpretation, but if it is because floating-point expressions are always converted to their nominal types before being compared, it may be a good astonishment-lowering move. The value of r5 differs from that of r1, pointing to a non-obvious demarcation line between compile-time evaluation and run-time evaluation. And the values of r5 and r6 differ, meaning that our interpretation “explicit casts to double have no effects” based on the comparison of the values of r1 and r3 on the one hand and r2 and r4 on the other hand, is wrong or that some other compilation pass can interfere.

What a mess! There is no way a precise static analyzer can be made for this recent version of Clang (with these unfashionable options). Plus the results depend on optimizations:

$ clang -mno-sse2 -std=c99 -O2  t.c && ./a.out
r1=0 r2=1 r3=0 r4=1 r5=1 r6=1 r7=1

FLT_EVAL_METHOD is not ready for precise static analysis

In conclusion, it would be possible, and only quite a lot of hard work, to make a precise static analyzer for programs destined to be compiled to x87 instructions by a modern GCC. But for most other compilers, even including recent ones, it is simply impossible: the compiler gives floating-point operations a meaning that only it knows.

This is the sort of problem we tackled in the Hisseo project mentioned last time. One of the solutions we researched was “Just do not make a precise static analyzer”, and another was “Just analyze the generated assembly code where the meaning of floating-point operations has been fixed”. A couple of years later, the third solution, “Just use a proper compiler”, is looking better and better. It could even be a certified one, although it does not have to. Both Clang and GCC, when targeting the SSE2 instruction set, give perfect FLT_EVAL_METHOD==0 results. We should all enjoy this period of temporary sanity until x86-64 processors all sport a fused-multiply-add instruction.

Two things I should point out as this conclusion's conclusion. First, with the introduction of SSE2, the IA-32 platform (and its x86-64 cousin) has gone from the worst platform still in existence for predictable floating-point results to the best. It has correctly rounded operations for the standard single-precision and double-precision formats, and it retains hardware support for an often convenient extended precision format. Second, the fused-multiply-add instruction is a great addition to the instruction set, and I for one cannot wait until I get my paws on a processor that supports it, but it is going to be misused by compilers to compile source-level multiplications and additions. Compilers have not become wiser. The SSE2 instruction set has only made it more costly for them to do the wrong thing than to do the right one. They will break predictability again as soon as the opportunity comes, and the opportunity is already in Intel and AMD's product pipelines.

Friday, July 19 2013

The word “binade” now has its Wikipedia page…

… but that's only because I created it.

If you are more familiar than me with Wikipedia etiquette, feel free to adjust, edit, or delete this page. Also, although a Wikipedia account is necessary to create a page, I think it is not required for editing, so you can add to the story too (but if you do not have an account you are perhaps no more familiar than me with Wikipedia etiquette).

Thursday, July 11 2013

Arithmetic overflows in Fluorine

There is a C quiz in the middle of this post, lost in the middle of all the reminiscing.

A history of arithmetic overflows in Frama-C

From the very beginnings in 2005, until after the first Open Source release in 2008, Frama-C's value analysis was assuming that all arithmetic overflows produced two's complement results. This seemed the right thing to do at the time.

Then an option was introduced to emit alarms on undefined overflows. John Regehr suggested it after testing one of the Open Source versions. The option was turned off by default. If a value analysis user turned the option on, any undefined arithmetic overflow in the program would be detected and reported as a possible error, with the same gravity as dereferencing NULL or accessing an array out of its bounds.

Later, a helpful reminder was added to the value analysis' output: in the default behavior of not emitting alarms, an informative message was emitted instead—if such an overflow was detected—about two's complement being assumed.

There was one last change in the last release, Fluorine. Actually, two changes: the name of the option for emitting alarms on undefined overflows changed, and the default value changed. The setting is now to emit alarms by default, and can be changed to not emitting them, for instance if the target code is destined to be compiled with gcc -fno-strict-overflow -fwrapv, in which case all overflows happening during execution can be expected to produce two's complement results.


One aspect remains unchanged in the above evolution: the discussion only applies to undefined overflows.

The philosophy was always to analyze programs as they were written, and not to force any change of habit on software developers. The initial choice not to warn about overflows was because we knew there were so many of these—most of them intentional—that we would be deluging the user with what would feel like a flood of false positives.

The gradual shift towards more arithmetic overflow awareness is a consequence of the fact that in C, some arithmetic overflows are undefined behavior. Compilers display increasing sophistication when optimizing the defined behaviors to the detriment of the predictability of undefined ones. To make a long story short, the “overflows produce 2's complement results” heuristic was wrong for some programs as compiled by some optimizing compilers.

In keeping with the same philosophy, “overflows” that are defined according to the C99 standard have always been treated by the value analysis plug-in with the semantics mandated by the standard. Those overflows that the standard says must have “implementation-defined” results are treated with the semantics that the overwhelming majority of compilation platforms give them (and it remains possible to model other architectures as the need arises).

A quiz

Other static analyzers may also warn for arithmetic overflows, but the philosophy can be different. The philosophy may for instance be that any overflow, regardless of whether it is defined or implementation-defined according to the letter of the standard, might be unintentional and should be brought to the attention of the developer.

In the few examples below, the goal is to predict whether Frama-C's value analysis with its new default setting in Fluorine would emit an alarm for an overflow. For extra credit, a secondary goal is to predict whether another static analyzer that warns for all overflows would warn. We assume a 32-bit int and a 16-bit short, same as (almost) everybody has.

int a = 50000;
int b = 50000;
int r = a * b;
unsigned int a = 50000;
unsigned int b = 50000;
unsigned int r = a * b;
int a = 50000;
int b = 50000;
unsigned int r = a * b;
short a = 30000;
short b = 30000;
short r = a * b;
unsigned short a = 50000;
unsigned short b = 50000;
unsigned int r = a * b;

Answers

int a = 50000;
int b = 50000;
int r = a * b;

There is an overflow in this snippet (mathematically, 50000 * 50000 is 2500000000, which does not fit in an int). This overflow is undefined, so the value analysis warns about it.


unsigned int a = 50000;
unsigned int b = 50000;
unsigned int r = a * b;

The multiplication is an unsigned int multiplication, and when the mathematical result of unsigned operations is out of range, the C99 standard mandates that overflows wrap around. Technically, the C99 standard says “A computation involving unsigned operands can never overflow, …” (6.2.5:9) but we are using the word “overflow” with the same meaning as everyone outside the C standardization committee including Wikipedia editors.

To sum up, in the C99 standard, overflows in signed arithmetic are undefined and there are no overflows in unsigned arithmetic (meaning that unsigned overflows wrap around).


int a = 50000;
int b = 50000;
unsigned int r = a * b;

The multiplication is again a signed multiplication. It does not matter that the result is destined to an unsigned int variable because in C, types are inferred bottom-up. So the value analysis warns about an undefined overflow in the multiplication here.


short a = 30000;
short b = 30000;
short r = a * b;

There is no overflow here in the multiplication because the last line behaves as short r = (short) ((int) a * (int) b);. The justification for this behavior can be found in clause 6.3.1 of the C99 standard about conversions and promotions (the general idea is that arithmetic never operates on types smaller than int or unsigned int. Smaller types are implicitly promoted before arithmetic takes place). The product 900000000 does fit in the type int of the multiplication. But then there is a conversion when the int result is assigned to the short variable r. This conversion is implementation-defined, so the value analysis does not warn about it, but another static analyzer may choose to warn about this conversion.


unsigned short a = 50000;
unsigned short b = 50000;
unsigned int r = a * b;

Perhaps contrary to expectations, there is an undefined overflow in the multiplication a * b in this example. Right in the middle of the aforementioned 6.3.1 clause in the C99 standard, on the subject of the promotion of operands with smaller types than int, the following sentence can be found:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.

All values of a 16-bit unsigned short fit a 32-bit int, so the third line is interpreted as unsigned int r = (unsigned int) ((int) a * (int) b);.


Incidentally, things would be completely different in this last example if int and short were the same size, say if int was 16-bit. In this case the third line would be equivalent to unsigned int r = (unsigned int) a * (unsigned int) b; and would only contain an unsigned, innocuous overflow.

Wrapping up

In Fluorine, the option to activate or deactivate the emission of these undefined arithmetic overflow alarms is called -warn-signed-overflow (the opposite version for no alarms being -no-warn-signed-overflow). I felt that providing this piece of information earlier would have rendered the quiz too easy.

Although Frama-C's value analysis adheres to the semantics of C and only warns for undefined overflows, it is possible to use Frama-C to check for the other kinds of overflows by using another plug-in, Rte, to automatically annotate the target C program with ACSL assertions that express the conditions for overflows. Note that that post pre-dates the Fluorine release and is written in terms of the old options.

Saturday, July 6 2013

On the precise analysis of C programs for FLT_EVAL_METHOD==2

There has been talk recently amongst my colleagues of Frama-C-wide support for compilation platforms that define FLT_EVAL_METHOD as 2. Remember that this compiler-set value, introduced in C99, means that all floating-point computations in the C program are made with long double precision, even if the type of the expressions they correspond to is float or double. This post is a reminder, to the attention of these colleagues and myself, of pitfalls to be anticipated in this endeavor.


We are talking of C programs like the one below.

#include <stdio.h>

int r1;

double ten = 10.0;

int main(int c, char **v)
{
  r1 = 0.1 == (1.0 / ten);
  printf("r1=%d\n", r1);
}

With a C99 compilation platform that defines FLT_EVAL_METHOD as 0, this program prints "r1=1", but with a compilation platform that sets FLT_EVAL_METHOD to 2, it prints “r1=0”.

Although we are discussing non-strictly-IEEE 754 compilers, we are assuming IEEE 754-like floating-point: we're not in 1980 any more. Also, we are assuming that long double has more precision than double, because the opposite situation would make any discussion specifically about FLT_EVAL_METHOD == 2 quite moot. In fact, we are precisely thinking of compilation platforms where float is IEEE 754 single-precision (now called binary32), double is IEEE 754 double-precision (binary64), and long double is the 8087 80-bit double-extended format.


Let us find ourselves a compiler with the right properties :

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
…
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) 
$ gcc -mfpmath=387 -std=c99 t.c && ./a.out 
r1=0

Good! (it seems)

The test program sets r1 to 0 because the left-hand side 0.1 of the equality test is the double-precision constant 0.1, whereas the right-hand side is the double-extended precision result of the division of 1 by 10. The two differ because 0.1 cannot be represented exactly in binary floating-point, so the long double representation is closer to the mathematical value and thus different from the double representation. We can make sure this is the right explanation by changing the expression for r1 to 0.1L == (1.0 / ten), in which the division is typed as double but computed as long double, then promoted to long double in order to be compared to 0.1L, the long double representation of the mathematical constant 0.1. This change causes r1 to receive the value 1 with our test compiler, whereas the change would make r1 receive 0 if the program was compiled with a strict IEEE 754 C compiler.

Pitfall 1: Constant expressions

Let us test the augmented program below:

#include <stdio.h>

int r1, r2;

double ten = 10.0;

int main(int c, char **v)
{
  r1 = 0.1 == (1.0 / ten);
  r2 = 0.1 == (1.0 / 10.0);
  printf("r1=%d r2=%d\n", r1, r2);
}

In our first setback, the program prints “r1=0 r2=1”. The assignment to r2 has been compiled into a straight constant-to-register move, based on a constant evaluation algorithm that does not obey the same rules that execution does. If we are to write a precise static analyzer that corresponds to this GCC-4.4.3, this issue is going to seriously complicate our task. We will have to delineate a notion of “constant expressions” that the analyzer with evaluate with the same rules as GCC's rules for evaluating constant expressions, and then implement GCC's semantics for run-time evaluation of floating-point expressions for non-constant expressions. And our notion of “constant expression” will have to exactly match GCC's notion of “constant expression”, lest our analyzer be unsound.

Clarification: What is a “precise” static analyzer?

This is as good a time as any to point out that Frama-C's value analysis plug-in, for instance, is already able to analyze programs destined to be compiled with FLT_EVAL_METHOD as 2. By default, the value analysis plug-in assumes IEEE 754 and FLT_EVAL_METHOD == 0:

$ frama-c -val t.c
…
t.c:9:[kernel] warning: Floating-point constant 0.1 is not represented exactly.
 Will use 0x1.999999999999ap-4.
 See documentation for option -warn-decimal-float
…
[value] Values at end of function main:
          r1 ∈ {1}
          r2 ∈ {1}


The possibility of FLT_EVAL_METHOD being set to 2 is captured by the option -all-rounding-modes:

$ frama-c -val -all-rounding-modes t.c
…
t.c:9:[kernel] warning: Floating-point constant 0.1 is not represented exactly.
 Will use 0x1.999999999999ap-4.
 See documentation for option -warn-decimal-float
…
[value] Values at end of function main:
          r1 ∈ {0; 1}
          r2 ∈ {0; 1}

The sets of values predicted for variables r1 and r2 at the end of main() each contain the value given by the program as compiled by GCC-4.4.3, but these sets are not precise. If the program then went on to divide r1 by r2, Frama-C's value analysis would warn about a possible division by zero, whereas we know that with our compiler, the division is safe. The warning would be a false positive.

We are talking here about making a static analyzer with the ability to conclude that r1 is 0 and r2 is 1 because we told it that we are targeting a compiler that makes it so.


The above example command-lines are for Frama-C's value analysis, but during her PhD, Thi Minh Tuyen Nguyen has shown that the same kind of approach could be applied to source-level Hoare-style verification of floating-point C programs. The relevant articles can be found in the results of the Hisseo project.

To be continued

In the next post, we will find more pitfalls, revisit a post by Joseph S. Myers in the GCC mailing list, and conclude that implementing a precise static analyzer for this sort of compilation platform is a lot of work.

- page 2 of 23 -