Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Compiler-driven language development

A quiz

What is pressing “return” next below going to reveal GCC has done wrong?

$ cat t.c
#include <limits.h>
int x = -1;

int main(int c, char **v) {
  x = INT_MIN % x;
  return x;
}
~ $ gcc -std=c99 t.c
~ $ ./a.out

Answer:

$ ./a.out 
Floating point exception

The programmer was using an x86_64 processor

GCC has compiled a perfectly well-defined C99 program for returning 0 into binary code that makes an error. The error is misdiagnosed as a floating point exception but is actually a division overflow. It happens because computing the C99 remainder operation % on this computer invokes the x86_64 instruction idivl, which can raise this error when invoked with the arguments passed to it in this program. The idivl instruction computes a quotient and a remainder at the same time; the overflow error relates to the computation of the quotient, which my program was going to throw away anyway.


My program was well-defined in C90, too. In both these versions of the C standard, the semantics of %, like other arithmetic operations, are described informally as first computing the result as unbounded integer and then, since the result 0 fits the signed destination type int, the operation is defined and the result is 0.

Over the period of time it has been generating code for x86 computers, GCC could have become standard-compliant by adding the couple of instructions necessary to compute the standard-mandated result. The most efficient sequence I see would be to test the divisor for equality with -1 and to conditionally move if equal the divisor into the dividend. This would compute -1 % -1 instead of dividend % -1, always producing the same result 0 without the need for an expensive branch.

GCC would not even have needed to generate this code all the time. Most of the times the divisor is statically known not to be -1 or the dividend is statically known not to be INT_MIN. In either case the guarding test is unnecessary.

To be fair, GCC can generate sophisticated assembly code when it exerts itself. If a program uses both dividend % divisor and dividend / divisor nearby in the same function, the generated assembly code is likely to call the division instruction idivl only once. My proposed conditional move would interfere with this optimization when it applies.

What it must be like to standardize a language like C

Good news! GCC held out, and now (in the C11 standard), INT_MIN % -1 is officially undefined behavior, so GCC is allowed to do what it wants. This goes to show two things:

  1. Revisions of the C standard are carefully crafted to allow as many compliant programs as possible to continue to be considered compliant, but this is only one of several conflicting objectives. My program is C99-compliant and is undefined in C11.
  2. Sometimes the standard defines the future direction of the language (e.g. the memory model part of C11) but sometimes it only ratifies whatever non-compliant semantics compilers have been implementing for years.

Comments

1. On Saturday, November 17 2012, 17:30 by John Regehr

Sometimes I've wondered what are the cases where the C standards people have intentionally rendered an existing, important C implementation as non-compliant. It seems like this probably hasn't happened very many times.

2. On Saturday, November 17 2012, 19:44 by pascal

John,

I first read your comment as “what are the cases where the C standards people have intentionally rendered an existing, important C piece of code non-compliant?”

My initial reaction was thus surprise. Of course there are pieces of code that new revisions of the standard have been rendered non-compliant! Kernighan and Ritchie's book, of all paradoxical repositories of source code, is full of examples that are no longer standard-compliant.

And then I realized you meant C compilers. Oh.

I could build a couple of snarky remarks from that, but it wouldn't be fair. The committee tries to please its customers; its customers are compiler makers. Compiler makers mostly try to please their own customers, C users. The first edition K&R examples that do not comply with recent standards are still accepted by many compilers.

Except, perhaps, when compiler makers are aggressively taking advantage of undefined behavior; but everyone needs to have a hobby. If compiler maker were not doing that, how would they spend their time?

3. On Wednesday, November 21 2012, 09:46 by Nhat Minh Lê

Hmm, I am somewhat puzzled over your claim that INT_MAX % -1 is well-defined in both C89 and C99. Could you quote the relevant part of the standard that shows that "the semantics of %, like other arithmetic operations, are described informally as first computing the result as unbounded integer and then, since the result 0 fits the signed destination type int, the operation is defined and the result is 0?"

All copies of the three standards (drafts) that I have state, about the multiplicative operators: "If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a." C11 adds "otherwise, the behavior of both a/b and a%b is undefined." However, even before the change, I would argue that the undefined behavior was implicit (as is any behavior not explicitly defined by the standard). At most, I would say that it was ambiguously defined.

4. On Wednesday, November 21 2012, 11:46 by pascal

Hello,

I was referring to the wording of 6.5:5 in C99, where the emphasis is mine:

“If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.”

This is the clause that makes INT_MIN / (-1) undefined behavior. It does not make INT_MIN % (-1) undefined because INT_MIN % (-1) is mathematically defined (it evaluates to 0) and is in the range of representable values for its type.

The informative section J.2 of C99 lists as undefined behavior:

- An exceptional condition occurs during the evaluation of an expression (6.5).

- The value of the second operand of the / or % operator is zero (6.5.5).

It does not list MIN_INT % (-1), and the words “exceptional condition” refer to clause 6.5:5, where they are defined as “if the result is not mathematically defined or not in the range of representable values for its type”.

Implicitly assuming that anything not explicitly made defined by the standard is undefined is not very workable in practice. If the sentence “If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a.” was intended to tell when a%b is defined (*), then what about addition? Addition + is not defined by any such equation, so if the equation in 6.5.5:6 makes % defined for all values other than those, by the same reasoning, should a+b be considered undefined for all arguments? No: the meaning of + is defined informally by the words “The result of the binary + operator is the sum of the operands.”. Similarly, the meaning of % is defined by “the result of the % operator is the remainder.”. The “implicitly undefined unless explicitly defined” stance will work when the C standard is a formal, executable description. Until then, it will be a matter of interpretation, because in the strictest sense the standard does not guarantee anything: it is only ambiguous English words.

Speaking of interpretation, in the following thread, in comment 8, Joseph S. Myers acknowledges that INT_MIN % (-1) is defined in C99 when he says “-std=c99 and other conformance options, for languages where INT_MIN % -1 is defined”. The situation was clarified in C11, but C99 was already published and does not say that INT_MIN % -1 is undefined, whatever the 2009 committee thought was intended 10 years earlier.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30484

(*) I do not think that this sentence says anything about the definedness of a%b. This is only a corollary of your argument. I think that the sentence in question is a logical implication. If a certain property holds, then another property must hold.

5. On Wednesday, November 21 2012, 13:25 by pascal
This discussion is a close match for the one happening in the comments here. From the way votes went in this discussion and generally in the question, obviously each of the two interpretations sound reasonable to different persons. Let us just be glad that the situation was clarified in C11 then.