## Checking for overflows operation by operation

By pascal on Friday, February 10 2012, 10:32 - Permalink

My colleague Bernard Botella pointed out an interesting example in an offline discussion following the last quiz.

## The setup

Consider the snippet:

int s; unsigned u1, u2; ... s = u1 - u2;

The programmer's intention with the assignment is to compute, in variable `s`

of type `int`

, the mathematical difference between the value of `u1`

and the value of `u2`

. You would expect that, using automatic plug-ins in Frama-C, ey would be able to check that the initial values for `u1`

and `u2`

are in ranges for which this is always what happens.

In fact, this looks like a perfect assignment for Rte and the value analysis. Rte can generate assertions for the various conversions and the value analysis can check whether the conditions are satisfied, following the method outlined in a recent post.

This is, after all, exactly what static analyzers — and coding rules that forbid overflows — are for.

## An obvious false positive

The overflow in the subtraction `u1 - u2`

looks like it can be justified away. After all, such a warning is emitted when `u1==3`

and `u2==5`

, and `s`

will still receive the expected value `-2`

after the implementation-defined conversion from the value of expression `u1 - u2`

, that is, `UINT_MAX - 1`

, to `int`

.

The programmer may think “okay, I can see why I get this warning, it happens as soon as `u2`

is larger than `u1`

, but this is defined and, in my case, innocuous. I only want to check that the mathematical difference fits in `s`

. The check about the absence of overflow in the conversion to `int`

is what I am really interested in.”

Say that the programmer is using the Rte plug-in to express the conditions for both the subtraction and the implicit conversion to `int`

to be safe, pretty-printing an annotated program as a result:

$ frama-c -rte-unsigned-ov -rte t.c -print ... void main(void) { /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */ /*@ assert rte: u1-u2 ≥ 0; */ s = (int)(u1 - u2); return; }

Since the programmer has seen that the second assertion, about an overflow in the subtraction `u1 - u2`

, is too restrictive, preventing use with `u1==3`

and `u2==5`

, ey removes it.
The programmer instead focuses on making sure that there is no overflow in the conversion to `int`

of the difference, and feels confident if ey sees that there isn't any warning about that. It seems normal, right?

void main(void) { /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */ s = (int)(u1 - u2); return; }

## But…

Consider the case where `u1`

is very small, `u2`

is very large, and the value analysis has been able to infer this information relatively precisely. We assume 0 ≤ u1 ≤ 10 and 3000000000 ≤ u2 ≤ 4000000000 for this example:

/*@ requires 0 <= u1 <= 10 ; requires 3000000000 <= u2 <= 4000000000 ; */ void main(void) { /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */ s = (int)(u1 - u2); return; }

In this case, the value analysis does not warn about the conversion to `int`

...

$ frama-c -val -lib-entry u.c ... u.c:8:[value] Assertion got status valid.

... but the value in `s`

is not the mathematical difference between `u1`

and `u2`

.

## Conclusion

The conclusion is that looking at overflows atomically is not a silver bullet. Granted, if all possible overflows in the program are verified to be impossible, then machine integers in the program observationally behave like mathematical integers. But the more common case is that some overflows do in fact happen in the program and are justified as benign by the programmer. The issue pointed out by Bernard Botella is that it is possible to justify an overflow, for which the rule seems too strict, by exhibiting input values for which the expected result is computed, and then to be surprised by the consequences.

Perhaps in another post, we can look at ways around this issue.

## Comments

Pascal, have you read the part about "fussy overflows" in the (or in one of the) As-If Infinitely Ranged papers? I only know of a handful of papers that have a reasonably nuanced view of integer overflow issues and this is one of the best.

You meant "automatically" instead "atomically", right?

Wouldn't a solution be:

if (u1>u2) { return u1-u2; } else { return u2-u1 ; }

this way the intention of the programmer becomes clear and you still can use Rte.

If I am not mistaken, the second problem is that in this architecture neither 3000000000 nor 4000000000 can be represented in an unsigned int. These values wrap around to fit in the unsigned int and the result cannot be the mathematical difference. As this is expressed in ACSL code, the Rte plugin cannot generate assertions to check that. right?

Hi, Sylvain and John. You are two persons that I thought might react to this post (along with Bernard who basically provided the idea behind it).

John:

Thanks for the suggestion. I'm placing the paper in my pile.

Sylvain:

"atomically" is the word I intended, to mean that possible overflows are looked at, and possibly justified as necessary but benign, one by one. I admit that I am not sure it's the best word in context.

The example is for a 32-bit architecture, where the range between 3 billions and 4 billions can be represented on an unsigned int. However, with the values in the last example, u1 - u2 ends up being computed as a small positive number, so that the cast to int seems safe to the value analysis.

Let's see if I could think the situation through right.

First, OK that 3000000000-4000000000 can be represented as unsigned int on 32bits arch., but not as a int: it is above INT_MAX which is 2147483647. The result can never give the correct mathematical value back for this range of input.

Second, I think that two unsigned integers are subtracted by adding the left operand with the right operand converted to the inverse signed integer value. So for all practical purpose u1 - u2 is equivalent to u1 + (unsigned int) ( - ((int)u2) ). In this form it becomes obvious that (int)u2 overflows for values above MAX_INT.

This is why my suggestion was too naive: it doesn't take this risk of hidden overflow into account.

I am curious to see what you propose as ways to handle this issue? One obvious is an ACSL precondition on u2.

The result may be expected to be negative, so its valid range is INT_MIN..INT_MAX. But the reasoning stays the same. And I meant INT_MAX and not MAX_INT, of course.

I was curious to check how the IEEE defines subtraction of unsigned integers. You can find the relevant standard VHDL library here: http://www.csee.umbc.edu/portal/hel... Look for "Id: A.9".

The subtraction logic is implemented as <vhdl> return ADD_UNSIGNED (L01,not(R01),'1'); </vhdl>. In this hardware description, "not(R01)" inverts the vector R01 and the third argument is the carry bit, which being set to '1' effectively adds one to R01. In essence it adds to the left argument L01 the 2's complement of the right argument R01, all that in an unsigned integers setting. So I am now confident that indeed .

L01 - R01.

is equivalent to.

L01 + (unsigned int) ( - ((int)R01) ).

on architectures with IEEE 2's complement compatible architectures.

.

Don't you think that the Rte generated annotations should contain.

/*@ assert rte: u2 <= INT_MAX; */ .

to prevent against this un-obvious overflow risk?

.

I had never put more thought about this subject before. I really appreciate the incentive that your blog gives to dig deeper in the semantic of a language that one can practice during years, but still not really understand!

@Sylvain

The behavior of u1-u2 in C99 is primarily defined by 6.5.2:9

[...] A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

You can offer alternative definitions of unsigned subtraction with more intermediate steps and then ask whether the intermediate steps deserve a warning, but the idea underlying this post is precisely that individually looking at sub-operations is not an efficient use of human time: sometimes the intermediate steps do overflow in a defined way and these overflows are harmless in the overall computation. If you choose the approach of preventing these, then the verifier will have to justify each harmless overflow, wasting time that could instead have been spent making sure the software works. Worse, some intermediate operations can both overflow harmlessly and overflow dangerously.

Emitting an assertion "u2 <= INT_MAX" would prevent from subtracting 3000000000U from 4000000000U, and there is no reason to prevent that. The assertion "u1-u2 ≥ 0" emitted by Rte corresponds to the conditions when the "is reduced modulo" part of 6.5.2:9 does not have to be applied to the result.

hum. Ugly, i must concede. but you can obtain the same "behavior" with something like that :

"frama-c-gui -rte -then -val -lib-entry -main function test.c", where "function" contains the few incriminated lines)

the result will be the same, I think (perhaps it's a naive hypothesis ?).

@Jean-Marc I am only breaking it down in small steps for the purpose of showing the reasoning. The pros do it in one command-line like the one you show (although I am not sure of the order of arguments. I would put the "test.c" before the "-then").