Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Checking for overflows operation by operation

My colleague Bernard Botella pointed out an interesting example in an offline discussion following the last quiz.

The setup

Consider the snippet:

int s;
unsigned u1, u2;
...
s = u1 - u2;

The programmer's intention with the assignment is to compute, in variable s of type int, the mathematical difference between the value of u1 and the value of u2. You would expect that, using automatic plug-ins in Frama-C, ey would be able to check that the initial values for u1 and u2 are in ranges for which this is always what happens.

In fact, this looks like a perfect assignment for Rte and the value analysis. Rte can generate assertions for the various conversions and the value analysis can check whether the conditions are satisfied, following the method outlined in a recent post.

This is, after all, exactly what static analyzers — and coding rules that forbid overflows — are for.

An obvious false positive

The overflow in the subtraction u1 - u2 looks like it can be justified away. After all, such a warning is emitted when u1==3 and u2==5, and s will still receive the expected value -2 after the implementation-defined conversion from the value of expression u1 - u2, that is, UINT_MAX - 1, to int.

The programmer may think “okay, I can see why I get this warning, it happens as soon as u2 is larger than u1, but this is defined and, in my case, innocuous. I only want to check that the mathematical difference fits in s. The check about the absence of overflow in the conversion to int is what I am really interested in.”


Say that the programmer is using the Rte plug-in to express the conditions for both the subtraction and the implicit conversion to int to be safe, pretty-printing an annotated program as a result:

$ frama-c -rte-unsigned-ov -rte t.c -print
...
void main(void)
{
  /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */
  /*@ assert rte: u1-u2 ≥ 0; */
  s = (int)(u1 - u2);
  return;
}

Since the programmer has seen that the second assertion, about an overflow in the subtraction u1 - u2, is too restrictive, preventing use with u1==3 and u2==5, ey removes it. The programmer instead focuses on making sure that there is no overflow in the conversion to int of the difference, and feels confident if ey sees that there isn't any warning about that. It seems normal, right?

void main(void)
{
  /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */
  s = (int)(u1 - u2);
  return;
}

But…

Consider the case where u1 is very small, u2 is very large, and the value analysis has been able to infer this information relatively precisely. We assume 0 ≤ u1 ≤ 10 and 3000000000 ≤ u2 ≤ 4000000000 for this example:

/*@ requires 0 <= u1 <= 10 ;
    requires 3000000000 <= u2 <= 4000000000 ; */
void main(void)
{
  /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */
  s = (int)(u1 - u2);
  return;
}

In this case, the value analysis does not warn about the conversion to int ...

$ frama-c -val -lib-entry u.c
...
u.c:8:[value] Assertion got status valid.

... but the value in s is not the mathematical difference between u1 and u2.

Conclusion

The conclusion is that looking at overflows atomically is not a silver bullet. Granted, if all possible overflows in the program are verified to be impossible, then machine integers in the program observationally behave like mathematical integers. But the more common case is that some overflows do in fact happen in the program and are justified as benign by the programmer. The issue pointed out by Bernard Botella is that it is possible to justify an overflow, for which the rule seems too strict, by exhibiting input values for which the expected result is computed, and then to be surprised by the consequences.

Perhaps in another post, we can look at ways around this issue.

Comments

1. On Friday, February 10 2012, 15:51 by John Regehr

Pascal, have you read the part about "fussy overflows" in the (or in one of the) As-If Infinitely Ranged papers? I only know of a handful of papers that have a reasonably nuanced view of integer overflow issues and this is one of the best.

2. On Friday, February 10 2012, 15:55 by sylvain

You meant "automatically" instead "atomically", right?

Wouldn't a solution be:
if (u1>u2) { return u1-u2; } else { return u2-u1 ; }
this way the intention of the programmer becomes clear and you still can use Rte.

If I am not mistaken, the second problem is that in this architecture neither 3000000000 nor 4000000000 can be represented in an unsigned int. These values wrap around to fit in the unsigned int and the result cannot be the mathematical difference. As this is expressed in ACSL code, the Rte plugin cannot generate assertions to check that. right?

3. On Friday, February 10 2012, 16:37 by pascal

Hi, Sylvain and John. You are two persons that I thought might react to this post (along with Bernard who basically provided the idea behind it).

John:

Thanks for the suggestion. I'm placing the paper in my pile.

Sylvain:

"atomically" is the word I intended, to mean that possible overflows are looked at, and possibly justified as necessary but benign, one by one. I admit that I am not sure it's the best word in context.

The example is for a 32-bit architecture, where the range between 3 billions and 4 billions can be represented on an unsigned int. However, with the values in the last example, u1 - u2 ends up being computed as a small positive number, so that the cast to int seems safe to the value analysis.

4. On Saturday, February 11 2012, 15:10 by sylvain

Let's see if I could think the situation through right.

First, OK that 3000000000-4000000000 can be represented as unsigned int on 32bits arch., but not as a int: it is above INT_MAX which is 2147483647. The result can never give the correct mathematical value back for this range of input.

Second, I think that two unsigned integers are subtracted by adding the left operand with the right operand converted to the inverse signed integer value. So for all practical purpose u1 - u2 is equivalent to u1 + (unsigned int) ( - ((int)u2) ). In this form it becomes obvious that (int)u2 overflows for values above MAX_INT.

This is why my suggestion was too naive: it doesn't take this risk of hidden overflow into account.

I am curious to see what you propose as ways to handle this issue? One obvious is an ACSL precondition on u2.

5. On Sunday, February 12 2012, 00:11 by sylvain

The result may be expected to be negative, so its valid range is INT_MIN..INT_MAX. But the reasoning stays the same. And I meant INT_MAX and not MAX_INT, of course.

I was curious to check how the IEEE defines subtraction of unsigned integers. You can find the relevant standard VHDL library here: http://www.csee.umbc.edu/portal/hel... Look for "Id: A.9".

The subtraction logic is implemented as <vhdl> return ADD_UNSIGNED (L01,not(R01),'1'); </vhdl>. In this hardware description, "not(R01)" inverts the vector R01 and the third argument is the carry bit, which being set to '1' effectively adds one to R01. In essence it adds to the left argument L01 the 2's complement of the right argument R01, all that in an unsigned integers setting. So I am now confident that indeed .
L01 - R01.
is equivalent to.
L01 + (unsigned int) ( - ((int)R01) ).
on architectures with IEEE 2's complement compatible architectures.
.
Don't you think that the Rte generated annotations should contain.
/*@ assert rte: u2 <= INT_MAX; */ .
to prevent against this un-obvious overflow risk?
.
I had never put more thought about this subject before. I really appreciate the incentive that your blog gives to dig deeper in the semantic of a language that one can practice during years, but still not really understand!

6. On Sunday, February 12 2012, 17:59 by pascal

@Sylvain

The behavior of u1-u2 in C99 is primarily defined by 6.5.2:9

[...] A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

You can offer alternative definitions of unsigned subtraction with more intermediate steps and then ask whether the intermediate steps deserve a warning, but the idea underlying this post is precisely that individually looking at sub-operations is not an efficient use of human time: sometimes the intermediate steps do overflow in a defined way and these overflows are harmless in the overall computation. If you choose the approach of preventing these, then the verifier will have to justify each harmless overflow, wasting time that could instead have been spent making sure the software works. Worse, some intermediate operations can both overflow harmlessly and overflow dangerously.

Emitting an assertion "u2 <= INT_MAX" would prevent from subtracting 3000000000U from 4000000000U, and there is no reason to prevent that. The assertion "u1-u2 ≥ 0" emitted by Rte corresponds to the conditions when the "is reduced modulo" part of 6.5.2:9 does not have to be applied to the result.

7. On Monday, February 13 2012, 17:02 by Jean-marc

hum. Ugly, i must concede. but you can obtain the same "behavior" with something like that :
"frama-c-gui -rte -then -val -lib-entry -main function test.c", where "function" contains the few incriminated lines)

the result will be the same, I think (perhaps it's a naive hypothesis ?).

8. On Tuesday, February 14 2012, 00:24 by pascal

@Jean-Marc I am only breaking it down in small steps for the purpose of showing the reasoning. The pros do it in one command-line like the one you show (although I am not sure of the order of arguments. I would put the "test.c" before the "-then").