## Checking for overflows, revisited once

By pascal on Tuesday, February 14 2012, 00:25 - Permalink

I do not have any solution I am 100% happy with to the overflow dilemma in the previous post. Here is one of the solutions that does not make me 100% happy.

The first (partial) solution is: program so that overflows correspond exactly to unwanted circumstances (and then it becomes meaningful to verify the absence of overflows. There are great automatic tools for that. Have you heard of them?)

“Program so that…” sounds abstract, but it may be as simple as using larger integer types that won't overflow for all intermediate computations:

s = (int)((long long) u1 - (long long)u2);

Then the desired property (variable `s`

contains the mathematical difference of `u1`

and `u2`

) entirely rests on the final conversion from `long long`

to `int`

.

This solution doesn't apply in the common situation where the code is already written and **gratuitous changes to it are prohibited**. Also, you need to have an integer type large enough to contain all intermediate computations without overflows. In this example, `unsigned int`

is 32-bit and `long long`

is 64-bit, but the standard does not prevent `unsigned int`

to be 64-bit and the same size as `long long`

, in which case the conversion from `unsigned int`

to `long long`

could overflow. Rte would only insert additional assertions for the two intermediate conversions that we have added in the assignment. This wouldn't help; it would only make things worse.

“Won't the generated code be less efficient?”, you rightly ask. That all depends on the compiler. Optimizing compilers have no trouble at all optimizing the above assignment so that it uses only one 32-bit subtraction. On the Linux workstation on which I tried this, both `clang -m32 -O0`

and `gcc -m32 -O0`

did. It is amusing, because the code generated by both compilers with `-O0`

is ugly (gratuitously moving registers to memory and then to registers again) but, even with optimizations disabled, they still both get the fact that it is only necessary to compute the least significant 32 bits of the result.

With optimizations, both Clang and GCC generate clean code that naturally still uses a single 32-bit subtraction:

$ gcc -S -O2 -m32 t.c $ cat t.s ... f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax subl 12(%ebp), %eax popl %ebp ret

As an anecdote, when generating x86-64 code (and therefore having 64-bit subtraction available as a single instruction, with only the drawback of a slightly larger encoding),

`gcc -O0`

keeps using 32-bit subtraction but`clang -O0`

uses 64-bit subtraction. This does not reveal anything about the compilers: it does not make sense to compare for efficiency the code they generate with optimization disabled.

That's all for this post.

## Comments

int s;

unsigned u1, u2;

...

s = u1 - u2;

assert((s < 0) == (u1 < u2));

...is one way of correctly asserting in this situation, which covers all relevant overflows assuming 2's complement arithmetic.

@nick

Nice.

Your comment reminds me that I have not finished this series. I was supposed to show alternative solutions in additional blog posts. In fact, the next post I was going to write was on the topic of using our ACSL annotation language to specify that the intended result of the computation is that s contains the mathematical difference between u1 and u2. In ACSL, computations take place within unbounded integers, so that the ACSL notation for the previous sentence is simply:

//@ assert s == u1 - u2 ;

In this annotation, u1 - u2 means “the mathematical difference between the program variables u1 and u2 at this point of the program”.

This may be a little confusing at first, because it looks like the ACSL assertion is trivially true after the assignment in the program (it is not). But one often needs unbounded integers (and sometimes more advanced mathematical notions yet) in annotations, hence the design choice. C computations can be expressed in ACSL with intermediate casts. For instance, the annotation /*@ assert s == (int) (unsigned int) (u1 - u2) ; */ would be trivially true, because it expresses exactly what the assignment s = u1 - u2; in the program does when u1 and u2 are unsigned ints.

Anyway, that's the way ACSL works, and in this case, you can't blame it for not elegantly expressing the intended specification. Once the ACSL assertion has been written, one can attempt to verify it statically with the Wp or Jessie Frama-C plug-ins (and fail, indicating that the assertion does not always hold, unless enough information is provided on the respective ranges of u1 and u2 to conclude that the assertion does in fact hold). One may also use constraint propagation techniques (with a closed-source, undistributed plug-in) to generate a counter-example that pin-points the problem more concretely than Wp or Jessie do.

Finally, and taking us back to your comment, the assertion /*@ assert s == u1 - u2 ; @/ fits in the executable subset[1] of ACSL, E-ACSL, so that yet another plug-in can translate the annotation into an executable assertion that can be checked at run-time. The translation relies on the GMP library for computing with unbounded integers. It is not as nice as your hand-crafted assertion —which does not require GMP— but the translation works for all annotations within the E-ACSL sublanguage.

[1] http://frama-c.com/eacsl.html