Frama-C news and ideas

To content | To menu | To search | Frama-C Home

A bit of explanation regarding the quiz in the last post

There are only positive constants in C, as per section 6.4.4 in the C99 standard:

integer-constant: 
        decimal-constant integer-suffixopt 
        octal-constant integer-suffixopt 
        hexadecimal-constant integer-suffixopt 
decimal-constant: 
        nonzero-digit 
        decimal-constant digit 
octal-constant: 
        0 
        octal-constant octal-digit 
hexadecimal-constant: 
        hexadecimal-prefix hexadecimal-digit 
        hexadecimal-constant hexadecimal-digit 
...

The minus sign is not part of the constant according to the grammar.


The expression -0x80000000 is parsed and typed as the application of the unary negation operator - to the constant 0x80000000. The table in section 6.4.4.1 of the standard shows that, when typing hexadecimal constants, unsigned types must be tried. The list of types to try to fit the hexadecimal constant in is, in order, int, unsigned int, long, unsigned long, long long, unsigned long long.

For many architectures, the first type in the list that fits 0x80000000 is unsigned int. Unary negation, when applied to an unsigned int, returns an unsigned int, so that -0x80000000 has type unsigned int and value 0x80000000.


Following the same reasoning as above, reading from the "Decimal Constant" column of the table in the C99 standard, the types to try are int, long, and long long. This might lead you to expect -2147483648 for the value of the expression -2147483648 compiled with GCC. Instead, when compiling this expression on a 32-bit architecture, GCC emits a warning, and the expression has the value 2147483648 instead. The warning is:

t.c:6: warning: this decimal constant is unsigned only in ISO C90

Indeed, there is a subtlety here for 32-bit architectures. GCC by default follows the C90 standard. It's not so much that the spirit of the table in section 6.4.4.1 in C99 changed between C90 and C99. The spirit remained the same, with unsigned types being tried for octal and hexadecimal constants, and mostly only signed types being tried for decimal constants. Here is the relevant snippet from the C90 standard:

The type of an integer constant is the first of the corresponding list in which its value can be represented. Unsuffixed decimal: int, long int, unsigned long int;

The difference really stems from the fact C90 did not have a long long type, and the list of types to try for a decimal constant ended in unsigned long, since that type contained values that did not fit in any other type. On a 32-bit architecture, where long and int are both 32-bit, 2147483648 fits neither int nor long, and so ends up being typed as an unsigned long. Note that on an architecture where long is 64-bit, then 2147483648 and -2147483648 are typed as long.


Finally, when GCC is told, with option -std=c99, to apply C99 rules on an architecture where long is 32-bit, then 2147483648 is typed as long long, so that the expression -2147483648 has type long long and value -2147483648.


This should explain the results obtained when compiling the three programs from last post with GCC on 32-bit and on 64-bit architectures.

Comments

1. On Friday, January 20 2012, 16:04 by John Regehr

Nice quiz. I've been writing C for close to 20 years and just the other day got bit by a constant problem, I forgot an L or a U or both. Go's arbitrary precision in constant calculations seems like an improvement...

2. On Friday, January 20 2012, 17:58 by pascal

Hello, John. Glad you liked it! The post has also generated offline discussions that will translate to two additional posts on the same general subject. Who would think that C programs that manipulate integers can cause so many aha moments?

3. On Monday, February 13 2012, 16:53 by sylvain

It is worth to note that MISRA-C:2004 has two rules related to these cases:
.
- 10.6 A “U” suffix shall be applied to all constants of unsigned type.
- 12.9 The unary minus operator shall not be applied to an expression whose underlying type is unsigned.
.
Your post give a good example of the wisdom of these rules which forbid the expression -0x80000000 altogether.