## Arithmetic overflows in Fluorine

By pascal on Thursday, July 11 2013, 14:09 - Permalink

There is a C quiz in the middle of this post, lost in the middle of all the reminiscing.

## A history of arithmetic overflows in Frama-C

From the very beginnings in 2005, until after the first Open Source release in 2008, Frama-C's value analysis was assuming that all arithmetic overflows produced two's complement results. This seemed the right thing to do at the time.

Then an option was introduced to emit alarms on undefined overflows. John Regehr suggested it after testing one of the Open Source versions. The option was turned off by default. If a value analysis user turned the option on, any undefined arithmetic overflow in the program would be detected and reported as a possible error, with the same gravity as dereferencing NULL or accessing an array out of its bounds.

Later, a helpful reminder was added to the value analysis' output: in the default behavior of not emitting alarms, an informative message was emitted instead—if such an overflow was detected—about two's complement being assumed.

There was one last change in the last release, Fluorine. Actually, two changes: the name of the option for emitting alarms on undefined overflows changed, and the default value changed. The setting is now to emit alarms by default, and can be changed to not emitting them, for instance if the target code is destined to be compiled with `gcc -fno-strict-overflow -fwrapv`

, in which case all overflows happening during execution can be expected to produce two's complement results.

One aspect remains unchanged in the above evolution: the discussion only applies to undefined overflows.

The philosophy was always to analyze programs as they were written, and not to force any change of habit on software developers. The initial choice not to warn about overflows was because we knew there were so many of these—most of them intentional—that we would be deluging the user with what would feel like a flood of false positives.

The gradual shift towards more arithmetic overflow awareness is a consequence of the fact that in C, some arithmetic overflows are undefined behavior. Compilers display increasing sophistication when optimizing the defined behaviors to the detriment of the predictability of undefined ones. To make a long story short, the “overflows produce 2's complement results” heuristic was wrong for some programs as compiled by some optimizing compilers.

In keeping with the same philosophy, “overflows” that are defined according to the C99 standard have always been treated by the value analysis plug-in with the semantics mandated by the standard. Those overflows that the standard says must have “implementation-defined” results are treated with the semantics that the overwhelming majority of compilation platforms give them (and it remains possible to model other architectures as the need arises).

## A quiz

Other static analyzers may also warn for arithmetic overflows, but the philosophy can be different. The philosophy may for instance be that any overflow, regardless of whether it is defined or implementation-defined according to the letter of the standard, might be unintentional and should be brought to the attention of the developer.

In the few examples below, the goal is to predict whether Frama-C's value analysis with its new default setting in Fluorine would emit an alarm for an overflow. For extra credit, a secondary goal is to predict whether another static analyzer that warns for all overflows would warn. We assume a 32-bit `int`

and a 16-bit `short`

, same as (almost) everybody has.

int a = 50000; int b = 50000; int r = a * b;

unsigned int a = 50000; unsigned int b = 50000; unsigned int r = a * b;

int a = 50000; int b = 50000; unsigned int r = a * b;

short a = 30000; short b = 30000; short r = a * b;

unsigned short a = 50000; unsigned short b = 50000; unsigned int r = a * b;

## Answers

int a = 50000; int b = 50000; int r = a * b;

There is an overflow in this snippet (mathematically, 50000 * 50000 is 2500000000, which does not fit in an `int`

). This overflow is undefined, so the value analysis warns about it.

unsigned int a = 50000; unsigned int b = 50000; unsigned int r = a * b;

The multiplication is an `unsigned int`

multiplication, and when the mathematical result of unsigned operations is out of range, the C99 standard mandates that overflows wrap around. Technically, the C99 standard says “A computation involving unsigned operands can never overflow, …” (6.2.5:9) but we are using the word “overflow” with the same meaning as everyone outside the C standardization committee including Wikipedia editors.

To sum up, in the C99 standard, overflows in signed arithmetic are undefined and there are no overflows in unsigned arithmetic (meaning that unsigned overflows wrap around).

int a = 50000; int b = 50000; unsigned int r = a * b;

The multiplication is again a signed multiplication. It does not matter that the result is destined to an `unsigned int`

variable because in C, types are inferred bottom-up. So the value analysis warns about an undefined overflow in the multiplication here.

short a = 30000; short b = 30000; short r = a * b;

There is no overflow here in the multiplication because the last line behaves as `short r = (short) ((int) a * (int) b);`

. The justification for this behavior can be found in clause 6.3.1 of the C99 standard about conversions and promotions (the general idea is that arithmetic never operates on types smaller than `int`

or `unsigned int`

. Smaller types are implicitly promoted before arithmetic takes place).
The product 900000000 does fit in the type `int`

of the multiplication. But then there is a conversion when the `int`

result is assigned to the `short`

variable `r`

. This conversion is implementation-defined, so the value analysis does not warn about it, but another static analyzer may choose to warn about this conversion.

unsigned short a = 50000; unsigned short b = 50000; unsigned int r = a * b;

Perhaps contrary to expectations, there is an undefined overflow in the multiplication `a * b`

in this example. Right in the middle of the aforementioned 6.3.1 clause in the C99 standard, on the subject of the promotion of operands with smaller types than `int`

, the following sentence can be found:

If an

`int`

can represent all values of the original type, the value is converted to an`int`

; otherwise, it is converted to an`unsigned int`

.

All values of a 16-bit `unsigned short`

fit a 32-bit `int`

, so the third line is interpreted as `unsigned int r = (unsigned int) ((int) a * (int) b);`

.

Incidentally, things would be completely different in this last example if `int`

and `short`

were the same size, say if `int`

was 16-bit. In this case the third line would be equivalent to `unsigned int r = (unsigned int) a * (unsigned int) b;`

and would only contain an unsigned, innocuous overflow.

## Wrapping up

In Fluorine, the option to activate or deactivate the emission of these undefined arithmetic overflow alarms is called `-warn-signed-overflow`

(the opposite version for no alarms being `-no-warn-signed-overflow`

). I felt that providing this piece of information earlier would have rendered the quiz too easy.

Although Frama-C's value analysis adheres to the semantics of C and only warns for undefined overflows, it is possible to use Frama-C to check for the other kinds of overflows by using another plug-in, Rte, to automatically annotate the target C program with ACSL assertions that express the conditions for overflows. Note that that post pre-dates the Fluorine release and is written in terms of the old options.

## Comments

Have you found examples where someone wants to know about defined overflows? I was thinking about this lately while reading about some integer overflow bugs in Java, where of course everything is well-defined. Although C's undefined behaviors are usually a curse, at least they let us report bugs more a bit more confidence...

Hello John,

the only kind of C code that I know of where an analyzer that warns for all integer overflows would not suffer from the dreaded “subjective false positive” problem for real but intended overflows is code that was written from the start to comply with coding rules that discourage all kinds of overflows, e.g. MISRA C. Code that was not written to avoid or explicitly “justify” overflows from the beginning seems to contain many of them. There are more opportunities to rely on them than one might think:

- pseudo-random number generators,

- multi-precision arithmetic,

- cycling over the values of a variable of narrow width such as char or short,

- and as you well know, implementing an interpreter for a language that, like Java, guarantees two's complement result for its signed integer types.

This post was written as a before-the-fact write-up (a write-down?) for Frama-C's participation in NIST's SATE V, and would you believe it, despite there being some “synthetic” test cases (“synthetic” = from the Juliet test suite) that explicitly test for overflows including harmless defined ones, there are also test cases that involuntarily include undefined signed overflow while testing something else. And this suite has been subjected to many static analyzers. My conclusion, for what it is worth, is that if there was any hope of sorting the involuntary overflow wheat from the intentional overflow chaff, an analyzer would have done it already. The only hope would be if one could convince developers to renounce intentional overflows, but that has been tried already (cf MISRA C) and they did not like it. Rules discouraging overflows in a language like Java would make even less sense, since overflows are more defined and architecture-independent in Java.