Frama-C news and ideas

To content | To menu | To search | Frama-C Home

The overflow when converting from float to integer is undefined behavior

Integer overflows in C

A previous post on this blog was a reminder that in C, signed integer arithmetic overflow is undefined behavior. In contrast, the behavior of overflows in conversions from integer type to signed integer type is implementation-defined. The C99 standard allows for an implementation-defined signal to be raised, but in practice, the widespread compilation platforms provide two's complement behavior. And you can trust that they will continue to do so, because it's implementation-defined. Compiler makers cannot change their mind willy-nilly as if it was undefined behavior: Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Floating-point overflows in C

The C standard does not mandate IEEE 754 floating-point arithmetic. Still, in practice, modern compilation platforms, if they provide floating-point features at all, provide either exactly IEEE 754 binary32 and binary64 formats and computations, or the same formats and a close approximation of the same computations.

IEEE 754 floating-point defines +inf and -inf values, so that any real number can be approximated in the target IEEE 754 format (albeit, when it ends up represented as an infinity, not precisely). This means that for C compilation platforms that implement IEEE 754 for floating-point, the condition “the value can be represented in the new type” is always true. There is no reason to worry of undefined behavior caused by overflow in either floating-point arithmetic or in the conversion of a double to a float.

Or indeed, in a constant. Consider GCC's warning here:

$ cat t.c
#include <stdio.h>

int main()
  double big = 0x1.0p5000;
  printf("%f\n", big);

$ gcc-172652/bin/gcc -std=c99 -Wall t.c && ./a.out 
t.c: In function ‘main’:
t.c:5:3: warning: floating constant exceeds range of ‘double’ [-Woverflow]

The number 2^5000, represented in C as 0x1.0p5000, is totally in the range of double, which goes up to inf. Clang similarly warns that “magnitude of floating-point constant too large for type double”. A proper warning message would be that 2^5000 cannot be represented precisely, instead of implying that it cannot be represented at all.

Floating-point ↔ integer conversion overflows in C

But enough pedantry contests with compilers. The range of floating-point representations being what it is, we are left with only overflows in conversions from floating-point to integer to consider.

Suspense… (for the reader who did not pay attention to the title)

Overflows in conversions from floating-point to integer are undefined behavior. Clause in the C99 standard make them so: Real floating and integer

1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

What can happen in practice when a C program invokes this particular flavor of undefined behavior? It is as bad as dereferencing an invalid address, mostly harmless like signed integer arithmetic overflow, or what? Let us find out.

The following program converts the double representation of 2^31, the smallest positive integer that does not fit a 32-bit int, to int.

int printf(const char *, ...);

int main()
  int i = 0x1.0p31;
  printf("%d\n", i);

Frama-C's value analysis warns about undefined behavior in this program:

$ frama-c -val t.c

warning: overflow in conversion of 0x1.0p31 (2147483648.) 
   from floating-point to integer.
   assert -2147483649 < 0x1.0p31 < 2147483648;

Fine-tuning the assertion -2147483649 < 0x1.0p31 < 2147483648 was a riot, by the way. Do you see why?

My aging (but still valiant) PowerPC-based Mac appears to think that saturation is the way to go: the variable i is set to INT_MAX:

$ gcc -std=c99 t.c && ./a.out 

Dillon Pariente was first to draw our attention to overflow in floating-point-to-integer conversions, which caused CPU exceptions on the target CPU for the code he was analyzing. I understood that target CPU to also be a PowerPC, so I suspect the behavior must be configurable on that architecture.

Dillon Pariente's example was along the lines of float f = INT_MAX; int i = f; which is also hilarious if you are into that sort of humor.

In order to really show how weird things can get on Intel processors, I need to modify the test program a bit:

int printf(const char *, ...);

volatile double v = 0;

int main()
  int i1 = 0x1.0p31;
  int i2 = 0x1.0p31 + v;
  printf("%d %d\n", i1, i2);

The volatile type qualifier precludes optimization, but there is no hardware or thread to change the value of variable v. The two expressions 0x1.0p31 and 0x1.0p31 + v are both expressions of type double that evaluate to 2^31.

Still GCC and Clang, like a single compiler, think that these two expressions needn't result in the same value when converted to int:

$ gcc t.c && ./a.out 
2147483647 -2147483648
$ clang  t.c && ./a.out 
2147483647 -2147483648

The results are different because one conversion was evaluated statically to be placed in %esi (2147483647) whereas the other was evaluated at run-time in %edx with the cvttsd2si instruction:

$ clang -S -O t.c  && cat t.s
_main:                                  ## @main
	movsd	_v(%rip), %xmm0
	addsd	LCPI0_0(%rip), %xmm0
	cvttsd2si	%xmm0, %edx
	leaq	L_.str(%rip), %rdi
	movl	$2147483647, %esi       ## imm = 0x7FFFFFFF
	xorb	%al, %al
	callq	_printf
L_.str:                                 ## @.str
	.asciz	 "%d %d\n"

Only undefined behavior allows GCC and Clang to produce different values for i1 and i2 here: the values of these two variables are computed by applying the same conversion to the same original double number, and should be identical if the program was defined.

Generally speaking, cvttsd2si always produces -0x80000000 in cases of overflow. That is almost like saturation, except that floating-point numbers that are too positive are wrapped to INT_MIN. One may think of it as saturating to either -0x80000000 or 0x80000000, and in the latter case, wrapping around to -0x80000000 because of two's complement. I do not know whether this rationale bears any resemblance to the one Intel's engineers used to justify their choice.

So one might think that this is the end of the story: as long as the conversion is done at run-time on an Intel platform, the compiler uses the cvttsd2si instruction. Overflows, if overflows there are, “saturate to INT_MIN” as the convention is on this platform. This can be confirmed experimentally with the following program variant:

#include <stdio.h>
#include <stdlib.h>

int main(int c, char **v)
  int i = 0x1.0p31 + strtod(v[1], 0);
  printf("%d\n", i);

This new program takes a number from the command-line and adds it to 2^31, so that there is no opportunity for compile-time evaluation. We expect the conversion to saturate to INT_MIN, and it does:

$ gcc -std=c99 t.c && ./a.out 1234 && ./a.out 12345 && ./a.out 123456

Wait! It gets more amusing still. Let us change the program imperceptibly:

int main(int c, char **v)
  unsigned int i = 0x1.0p32 + strtod(v[1], 0);
  printf("%u\n", i);

The behavior of run-time overflow in the conversion from double to integer changes completely:

$ gcc -m64 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999 

But conversion saturates again, at zero this time, for the same program, when targeting IA-32:

$ gcc -m32 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999

Do you have an explanation for this one? Leave a message in the comments section below. The fastest author of a complete explanation wins a static analyzer license.


In conclusion, the overflow in the conversion from floating-point to integer is rather on the nasty side of C's undefined behavior spectrum. It may appear to behave consistently if the compilation targets an architecture where the underlying assembly instruction(s) saturate. Saturation is the behavior that compilers GCC and Clang implement when they are able to evaluate the conversion at compile-time. In these conditions, a lucky programmer may not actually observe anything strange.

The idiosyncrasies of other architectures may lead to very different results for overflowing conversions depending on parameters outside the programmer's control (constant propagation, for instance, is more or less efficient depending on the optimization level and may be difficult to predict, as we already complained about when discussing Clang targeting the 387 FPU).

Acknowledgements: In addition to Dillon Pariente, I discussed this topic with Boris Yakobowski, John Regehr, Stephen Canon, and StackOverflow users tenos, Sander De Dycker and Mike Seymour prior to writing this blog post.


1. On Thursday, October 10 2013, 01:09 by John Regehr

Arianespace called. They want their rocket back.

2. On Thursday, October 10 2013, 01:14 by pascal

Hello, John.

The code in Ariane V that was blamed for the crash of the first flight —I pledge to stop using “maiden” with this meaning, it is sexist— was written in Ada.

Thanks to you, I have a new piece of commentary on this crash to add to my collection:

3. On Friday, October 11 2013, 15:37 by Simon Gerber

I can't reproduce your last example, I get the 64-bit behaviour for both IA-32 and x86_64 with gcc (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3.

4. On Friday, October 11 2013, 16:43 by pascal

Hello, Simon, I am glad that someone is taking the post seriously unlike everyone else so far. :)

There are no instructions in the x86 instruction sets to convert from floating-point to unsigned integer.

On Mac OS X for Intel, compiling a 64-bit program, the conversion from double to 32-bit unsigned int is compiled in a single instruction: the instruction for 64-bit conversions, cvttsd2siq, with destination a 64-bit register of which only the bottom 32-bit will subsequently be used as the 32-bit unsigned integer it represents:

$ cat t.c
#include <stdio.h>
#include <stdlib.h>
int main(int c, char **v)
unsigned int i = 0x1.0p32 + strtod(v[1], 0);
printf("%u\n", i);
$ gcc -m64 -S -std=c99 -O t.c && cat t.s
addsd LCPI1_0(%rip), %xmm0 ; this is the + from the C program
cvttsd2siq %xmm0, %rsi ; one-instruction conversion

This explains how, on that platform, a result modulo 2^32 can be obtained for doubles that are small enough (specifically, small enough to fit in a signed 64-bit integer).

In the old IA-32 instruction set, there is no instruction to convert a double to a 64-bit signed integer. The conversion to 32-bit unsigned has to be done by combining a few of the instructions that do exist, including two instructions cvttsd2si to convert from double to 32-bit signed integer:

$ gcc -m32 -S -std=c99 -O t.c && cat t.s
addsd LCPI1_0-L1$pb(%esi), %xmm0 ; this is the + from the C program
movsd LCPI1_1-L1$pb(%esi), %xmm1 ; conversion to unsigned int starts here
movapd %xmm0, %xmm2
subsd %xmm1, %xmm2
cvttsd2si %xmm2, %eax
xorl $-2147483648, %eax
ucomisd %xmm1, %xmm0
cvttsd2si %xmm0, %edx
cmovael %eax, %edx

Two alternative solutions are computed, respectively in %eax and in %edx. The alternatives are each correct on different definition domains. If the number to convert, in %xmm0, is larger than the constant 2^31 in %xmm1, then one alternative is chosen, otherwise, the other one is. The high-level algorithm, using only conversion from double to int, would be:

    if (d < 2^31) then (unsigned int)(int)d else (2^31 + (unsigned int)(int)(d - 2^31))

This translation of the C conversion from double to unsigned int gives the same saturating behavior as the 32-bit conversion instruction that it relies on:

$ gcc -m32 -std=c99 -O t.c && ./a.out 123456

What assembly code do you get for the program with gcc -m32 and gcc -m64 on your platform?

PS: this CMS sucks so feel free to write any reply you may have on and to post only the URL here.