Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - undefined behavior

Entries feed - Comments feed

Wednesday, October 9 2013

The overflow when converting from float to integer is undefined behavior

Integer overflows in C

A previous post on this blog was a reminder that in C, signed integer arithmetic overflow is undefined behavior. In contrast, the behavior of overflows in conversions from integer type to signed integer type is implementation-defined. The C99 standard allows for an implementation-defined signal to be raised, but in practice, the widespread compilation platforms provide two's complement behavior. And you can trust that they will continue to do so, because it's implementation-defined. Compiler makers cannot change their mind willy-nilly as if it was undefined behavior:

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Floating-point overflows in C

The C standard does not mandate IEEE 754 floating-point arithmetic. Still, in practice, modern compilation platforms, if they provide floating-point features at all, provide either exactly IEEE 754 binary32 and binary64 formats and computations, or the same formats and a close approximation of the same computations.

IEEE 754 floating-point defines +inf and -inf values, so that any real number can be approximated in the target IEEE 754 format (albeit, when it ends up represented as an infinity, not precisely). This means that for C compilation platforms that implement IEEE 754 for floating-point, the condition “the value can be represented in the new type” is always true. There is no reason to worry of undefined behavior caused by overflow in either floating-point arithmetic or in the conversion of a double to a float.

Or indeed, in a constant. Consider GCC's warning here:

$ cat t.c
#include <stdio.h>

int main()
{
  double big = 0x1.0p5000;
  printf("%f\n", big);
}

$ gcc-172652/bin/gcc -std=c99 -Wall t.c && ./a.out 
t.c: In function ‘main’:
t.c:5:3: warning: floating constant exceeds range of ‘double’ [-Woverflow]
inf

The number 2^5000, represented in C as 0x1.0p5000, is totally in the range of double, which goes up to inf. Clang similarly warns that “magnitude of floating-point constant too large for type double”. A proper warning message would be that 2^5000 cannot be represented precisely, instead of implying that it cannot be represented at all.

Floating-point ↔ integer conversion overflows in C

But enough pedantry contests with compilers. The range of floating-point representations being what it is, we are left with only overflows in conversions from floating-point to integer to consider.


Suspense… (for the reader who did not pay attention to the title)


Overflows in conversions from floating-point to integer are undefined behavior. Clause 6.3.1.4 in the C99 standard make them so:

6.3.1.4 Real floating and integer

1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

What can happen in practice when a C program invokes this particular flavor of undefined behavior? It is as bad as dereferencing an invalid address, mostly harmless like signed integer arithmetic overflow, or what? Let us find out.

The following program converts the double representation of 2^31, the smallest positive integer that does not fit a 32-bit int, to int.

int printf(const char *, ...);

int main()
{
  int i = 0x1.0p31;
  printf("%d\n", i);
}

Frama-C's value analysis warns about undefined behavior in this program:

$ frama-c -val t.c

warning: overflow in conversion of 0x1.0p31 (2147483648.) 
   from floating-point to integer.
   assert -2147483649 < 0x1.0p31 < 2147483648;

Fine-tuning the assertion -2147483649 < 0x1.0p31 < 2147483648 was a riot, by the way. Do you see why?

My aging (but still valiant) PowerPC-based Mac appears to think that saturation is the way to go: the variable i is set to INT_MAX:

$ gcc -std=c99 t.c && ./a.out 
2147483647

Dillon Pariente was first to draw our attention to overflow in floating-point-to-integer conversions, which caused CPU exceptions on the target CPU for the code he was analyzing. I understood that target CPU to also be a PowerPC, so I suspect the behavior must be configurable on that architecture.

Dillon Pariente's example was along the lines of float f = INT_MAX; int i = f; which is also hilarious if you are into that sort of humor.


In order to really show how weird things can get on Intel processors, I need to modify the test program a bit:

int printf(const char *, ...);

volatile double v = 0;

int main()
{
  int i1 = 0x1.0p31;
  int i2 = 0x1.0p31 + v;
  printf("%d %d\n", i1, i2);
}

The volatile type qualifier precludes optimization, but there is no hardware or thread to change the value of variable v. The two expressions 0x1.0p31 and 0x1.0p31 + v are both expressions of type double that evaluate to 2^31.

Still GCC and Clang, like a single compiler, think that these two expressions needn't result in the same value when converted to int:

$ gcc t.c && ./a.out 
2147483647 -2147483648
$ clang  t.c && ./a.out 
2147483647 -2147483648

The results are different because one conversion was evaluated statically to be placed in %esi (2147483647) whereas the other was evaluated at run-time in %edx with the cvttsd2si instruction:

$ clang -S -O t.c  && cat t.s
...
_main:                                  ## @main
...
	movsd	_v(%rip), %xmm0
	addsd	LCPI0_0(%rip), %xmm0
	cvttsd2si	%xmm0, %edx
	leaq	L_.str(%rip), %rdi
	movl	$2147483647, %esi       ## imm = 0x7FFFFFFF
	xorb	%al, %al
	callq	_printf
...
L_.str:                                 ## @.str
	.asciz	 "%d %d\n"

Only undefined behavior allows GCC and Clang to produce different values for i1 and i2 here: the values of these two variables are computed by applying the same conversion to the same original double number, and should be identical if the program was defined.


Generally speaking, cvttsd2si always produces -0x80000000 in cases of overflow. That is almost like saturation, except that floating-point numbers that are too positive are wrapped to INT_MIN. One may think of it as saturating to either -0x80000000 or 0x80000000, and in the latter case, wrapping around to -0x80000000 because of two's complement. I do not know whether this rationale bears any resemblance to the one Intel's engineers used to justify their choice.


So one might think that this is the end of the story: as long as the conversion is done at run-time on an Intel platform, the compiler uses the cvttsd2si instruction. Overflows, if overflows there are, “saturate to INT_MIN” as the convention is on this platform. This can be confirmed experimentally with the following program variant:

#include <stdio.h>
#include <stdlib.h>

int main(int c, char **v)
{
  int i = 0x1.0p31 + strtod(v[1], 0);
  printf("%d\n", i);
}

This new program takes a number from the command-line and adds it to 2^31, so that there is no opportunity for compile-time evaluation. We expect the conversion to saturate to INT_MIN, and it does:

$ gcc -std=c99 t.c && ./a.out 1234 && ./a.out 12345 && ./a.out 123456
-2147483648
-2147483648
-2147483648


Wait! It gets more amusing still. Let us change the program imperceptibly:

int main(int c, char **v)
{
  unsigned int i = 0x1.0p32 + strtod(v[1], 0);
  printf("%u\n", i);
}

The behavior of run-time overflow in the conversion from double to integer changes completely:

$ gcc -m64 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999 
1234
123456
3755744407

But conversion saturates again, at zero this time, for the same program, when targeting IA-32:

$ gcc -m32 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999
0
0
0

Do you have an explanation for this one? Leave a message in the comments section below. The fastest author of a complete explanation wins a static analyzer license.

Conclusion

In conclusion, the overflow in the conversion from floating-point to integer is rather on the nasty side of C's undefined behavior spectrum. It may appear to behave consistently if the compilation targets an architecture where the underlying assembly instruction(s) saturate. Saturation is the behavior that compilers GCC and Clang implement when they are able to evaluate the conversion at compile-time. In these conditions, a lucky programmer may not actually observe anything strange.

The idiosyncrasies of other architectures may lead to very different results for overflowing conversions depending on parameters outside the programmer's control (constant propagation, for instance, is more or less efficient depending on the optimization level and may be difficult to predict, as we already complained about when discussing Clang targeting the 387 FPU).


Acknowledgements: In addition to Dillon Pariente, I discussed this topic with Boris Yakobowski, John Regehr, Stephen Canon, and StackOverflow users tenos, Sander De Dycker and Mike Seymour prior to writing this blog post.

Wednesday, July 31 2013

From Pascal strings to Python tuples

Quiz time

What does the program below do?

#include <stdio.h>

int main(){
  struct {
    int t[4];
    int u;
  } v;
  v.u = 3;
  v.t[4] = 4;
  printf("v.u=%d", v.u);
  return 0;
}

Two answers are “it prints v.u=4” and “it prints v.u=3”:

$ gcc t.c && ./a.out 
v.u=4
$ gcc -O2 t.c && ./a.out 
v.u=3

The correct answer is of course that the program invokes undefined behavior. It is not that we are using at any time an lvalue of the wrong type to access memory, breaking the so-called “strict aliasing rules”. It is not that v.t+4 is outside of object v. The problem is that v.t+4 is outside object v.t. So GCC does what it pleases, and when compiling with -O2, optimizes brutally:

$ gcc -S -O2 t.c && cat t.s
.LC0:
	.string	"v.u=%d\n"
…
	movl 	$3, %edx
	movl 	$.LC0, %esi
	movl	 $1, %edi
	xorl	%eax, %eax
	call	__printf_chk

Frama-C's value analysis warns for the above program:

$ frama-c -val t.c
t.c:9:[kernel] warning: accessing out of bounds index {4}. assert 4 < 4;

In general, accessing t[i] when t is an array of size 4 is only valid when i < 4, but here the index is hard-coded as 4, so line 9 is only valid when 4 < 4. That is, never: all executions that reach line 9 encounter undefined behavior there.

Second quiz, same as the first quiz

What does the program below do?

#include "stdlib.h"

typedef struct{
  int tab[1];
} ts;

int main() {
  ts *q = malloc(5*sizeof(int));
  q->tab[2]= 5;
  return 1;
}

If you guessed “invoke undefined behavior”, well done!


The program above was shown to me by facetious colleague Bernard Botella, who is hard at work analyzing Python 2.7.4's runtime in the context of a project named SafePython. The snippet above is his reduced version of a larger piece of C code he found there. The issue Bernard was having started with the type definition below, and I will let you guess the rest:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[1];

   /* ob_item contains space for 'ob_size' elements.
    * Items must normally not be NULL, except during construction when
    * the tuple is not yet visible outside the function that builds it.
    */
} PyTupleObject;

In C90, the “array of size 1 as last member of a struct” was a common idiom for implementing things like Pascal strings. And of course it is just as valid for variable-length tuples. The problem is that this is not 1990 any more: compilers now use undefined behavior as an excuse to optimize aggressively, and the idiom is no longer valid at all for either tuples or Pascal strings. On the plus side, in the C99 standard we got “incomplete types”, a safe way to implement tuples and Pascal strings:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[];
…

Conclusion

I have encouraged my colleague Bernard to report the above as a bug in Python. This kind of bug report is usually ignored, because it denounces idioms that programmers have used for a long time and that they think they understand. Just remember: if you think you can predict what the program in the second quiz does, you should be able to predict what the program in the first quiz does (or explain what is different about it).

Monday, May 20 2013

Attack by Compiler

The title of this post, “Attack by Compiler”, has been at the back of my mind for several weeks. It started with a comment by jduck on a post earlier this year. The post's topic, the practical undefinedness of reading from uninitialized memory, and jduck's comment, awakened memories from a 2008 incident with the random number generator in OpenSSL.

As I am writing this, if I google “attack by compiler”, the first page of results include the classic essay Reflections on Trusting Trust by Ken Thompson, Wikipedia's definition of a backdoor in computing, an article by David A. Wheeler for countering the attack described by Thompson, a commentary by Bruce Schneier on Wheeler's article, and a Linux Journal article by David Maynor on the practicality of the attack described by Thompson on the widespread GNU/Linux platform.


This post is about a slightly different idea.

Initial conditions: trustworthy compiler, peer-reviewed changes

Suppose that we start with a trustworthy compiler, widely distributed in both source and binary form. Some people are in the position to make changes to the compiler's source code and could, like Thompson in his essay, attempt to insert a backdoor in the compiler itself.

But now, each change is scrutinized by innumerable witnesses from the Open-Source community. I say “witnesses”, in fact they are mostly students, but we will probably be forced to assume that they can't all be mischievous.

The attackers could try to obfuscate the backdoor as they insert it, but the problem is that some of the witnesses are bound to possess this character trait that the less they understand a change, the more they investigate it. Furthermore, once these witnesses have uncovered the truth, loss of credibility will ensue for the person who tried to sneak the backdoor in. This person will lose eir commit privilege to the compiler's sources, and people will recover untainted compilers in source and binary form from their archives. This kind of approach is risky and may only result in a temporary advantage—which may still be enough.

The underhanded approach

The 2013 edition of the Underhanded C Contest is under way. The contest defines underhanded code as:

code that is as readable, clear, innocent and straightforward as possible, and yet [fails] to perform at its apparent function

Underhandedness is exactly what an attacker with commit access to the source code of the widely used compiler should aim for. If the commit is underhanded enough, the committer may not only enjoy full deniability, but ey may obtain that the incriminating change stays in ulterior versions of the compiler, as a “good” change. This implies that all affected security-sensitive applications, like the “login” program in Thompson's essay, must be updated to work around the now official backdoor in the compiler. In this scenario, even after the attack has been discovered, anytime someone unknowingly compiles an old version of “login” with a recent compiler, it's another win for the attacker.


Fortunately, we agree with Scott Craver that the C programming language is a very good context to be underhanded in, and a C compiler is even better. How about the following ideas?

  1. making pseudo-random number generators that rely on uninitialized memory less random, in the hope that this will result in weaker cryptographic keys for those who do not know about the flaw;
  2. optimizing a NULL test out of kernel code when it is one of several defenses that need to be bypassed;
  3. optimizing buffer + len >= buffer_end || buffer + len < buffer overflow tests out from application code, so that buffer overflows do take place in code that is guarded thus;
  4. optimizing source code that was written to take constant-time into binary code that reveals secrets by terminating early.

I am not being very original. According to this post by Xi Wang, idea (1) is only waiting for someone to give the compiler one last well-calibrated shove. The NULL test optimization was already implemented in the compiler when it was needed for a famous Linux kernel exploit. The interesting scenario would have been if someone had found that the code in tun_chr_poll() was almost exploitable and had submitted the GCC optimization to activate the problem, but it did not happen in this order. Idea (3) really happened.

Idea (4) has not been exploited that I know of, but it is only only a matter of time. If I google for “constant-time memcmp”, I may find an implementation such as follows:

int memcmp_nta(const void *cs, const void *ct, size_t count)
{
  const unsigned char *su1, *su2;
  int res = 0;

  for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
  res |= (*su1 ^ *su2);

  return res;
}

Nothing in the C language definition forces a compiler to compile the above function into a function that does not return as soon as variable res contains (unsigned char)-1, not to mention the possibilities if the compiler first inlines the function in a site where its result is only compared to 0, and then optimizes the code for early termination. If I was trying to sneak in a compiler change that defeats the purpose of this memcmp_nta() function, I would bundle it with auto-vectorization improvements. It is a fashionable topic, and quite exciting if one does not care about non-functional properties such as execution time.

Conclusion

The impracticability of the described attack is counter-balanced by some unusual benefits: at the time of the attack, someone may already have audited the pseudo-random number generator or function memcmp_nta() we used as examples. The audit may have considered both source and generated assembly code, and involved actual tests, at a time when the code was “safely” compiled. But the auditor is not going to come back to the review again and again each time a new compiler comes out. Like Monty Python's Spanish Inquisition, nobody expects the compiler-generated backdoor.


Three of my four examples involve undefined behavior. More generally, my examples all involve unwarranted programmer expectations about C idioms. This is the key to plausibly deniable compiler changes that reveal latent security flaws. What other unwarranted expectations should we take advantage of for an “attack by compiler”?

Saturday, April 13 2013

Making oughts from ises

A previous post discussed the nature of uninitialized and indeterminate memory throughout the C standards. The argument was “avoid using uninitialized data, even if you think you know what you are doing; you may be right, but regardless your compiler might think it knows what you are doing and be wrong about it”. I tried to strengthen this argument by claiming that writing the example for that post was how I found my first compiler bug “by hand”.

An example involving unsequenced addition and sequenced comma operators

That bit about a compiler bug being my first, I now fear, was an exaggeration. I had found and forgotten another arguable GCC bug in 2012 while investigating sequence points. First, consider the program below.

#include <stdio.h>

int i, j;

int f(void)
{
  i++;
  return j;
}

int g(void)
{
  j++;
  return i;
}

int main()
{
  printf("%d\n", f() + g());
}

Although functions f() and g() may be called in an unspecified order, calling a function and returning from it are sequence points. The program therefore avoids the sin of undefined behavior caused by C99's clause 6.5:2.

6.5:2 Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

The program can only print an unspecified choice of either f() + g() when calling f() first or f() + g() when calling g() first, the two of which happen to result in the same value 1.

The same reasoning that makes the C program above always print 1 should make the program below always print 1 too:

$ cat t2.c
#include <stdio.h>

int i, j;

int main()
{
  printf("%d\n", (0,i++,j) + (0,j++,i));
}

But unfortunately:

$ gcc t2.c
$ ./a.out 
2

This program should print the same result as the previous program (the one with f() and g()) because the modifications of j are sequenced by 0, on the left and ,i on the right, and similarly for i. In C99 at least, the compiler must pick a choice between adding an incremented i to the initial value of j, or the other way round. This example should print 1, in short, for the same reason that the first example in this post could be expected to print 1, or that an example that prints cos(x) * cos(x) + sin(x) * sin(x) with MT-unsafe functions cos() and sin() can still be expected to print a floating-point value close to 1.0.

Pet peeve: guessing at what the standard ought to say from what compilers do

This anecdote slowly but surely brings us to the annoyance I intended to denounce all along, and this annoyance is when programmers infer what the standard ought to be saying from what compilers do. Any discussion of the kind of corner cases I described recently invite arguments about how compilers want to optimize the generated code as if uninitialized data could not be used, or as if the two operands of + were evaluated in parallel.

This is backwards. Yes, compilers want to optimize. But compilers are supposed to follow the standard, not the other way round. And section 6 of the C99 standard does not once use the word “parallel”. It says that the evaluation order of sub-expressions is unspecified (6.5:3), and that unsequenced side-effects cause undefined behavior (6.5:2), but nothing about parallel evaluation.

Never mind the example

Actually, C99's 6.5:3 clause really says “[…] the order of evaluation of subexpressions and the order in which side effects take place are both unspecified”. I am unsure about the implications of this thing about the order of side-effects. I might even, and the long-time reader of this blog will savor this uncharacteristic admission, be wrong: perhaps a compiler is allowed to generate code that prints 2 for t2.c after all.

Nevertheless, this does not detract from the point I intended to make, which is that it is bad engineering practice to take the current behavior of popular compilers as language definition. If the C99 language allows to “miscompile” (0,i++,j) + (0,j++,i), so be it. But having read the relevant parts of the standard (6.5.2.2:10, 6.5.2.2:12, and 5.1.2.3:2 in addition to the previously mentioned clauses), it seems to me that if the standard allows to miscompile the above, it also allows to miscompile f(1) + f(2) for any function f() with global side-effects.

Global side-effects include temporarily modifying the FPU rounding mode, using a statically allocated scratch buffer, initializing a table of constants at first call, calling malloc() and free(), or even random(). All these examples are intended to be invisible to the caller, and lots of library functions you routinely use may be doing them.

So in this case my argument remains very much the same: that compilers are not implementing aggressive optimizations (yet) in presence of f(1) + f(2) should be no excuse for not clarifying whether the standard allows them.

Wednesday, March 13 2013

Reading indeterminate contents might as well be undefined

Warning: on a punctiliousness scale ranging from zero to ten, this post is a good nine-and-a-half. There was no tag for that, so I tagged it both “C99” and “C11”. The faithful reader will know what to expect. There is a bit of C90, too.

To summarize, it may appear that according to the letter of modern C standards, it is only dangerous to use uninitialized variables, instead of very dangerous. Nevertheless, this post shows that it does not matter what the standards say: compilers will bite you even when you are arguably right.

Some context in the form of a link

In 2012, Xi Wang wrote a nice blog post showing it is not a good idea to use an uninitialized variable as a source of additional entropy when trying to create a random seed.

“Xoring an uninitialized variable with whatever other source of entropy you already have cannot hurt”, the conventional thinking goes. Conventional thinking is wrong. Your typical modern compiler deletes the code that gathers the original entropy, since it is only going to be xored with an uninitialized variable. Hence the title of Xi Wang's blog post, More Randomness or Less.

In C90, “indeterminate” was simple

In the nineties, real men were real men, C standards were short, and reading indeterminate contents(such as uninitialized variables) was listed in the very definition of “undefined behavior”:

1.6 DEFINITIONS OF TERMS

Unspecified behavior — behavior, for a correct program construct and correct data, for which the Standard imposes no requirements.

Undefined behavior — behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements.

“Undefined behavior” means the compiler can do what it wants, so the behavior noticed by Xi Wang can in no way be held against a C90 compiler.

In 1999, C standards became more complicated

The C99 standard does not directly list “reading indeterminate contents” as undefined behavior. Instead, it defines indeterminate contents as “either an unspecified value or a trap representation”. Reading a trap representation causes undefined behavior (6.2.6.1:5). The nuance here is that the type unsigned char is guaranteed not to have any trap representations (and thus can always be used to read indeterminate contents).

Less randomness : the simplified version

“But my compilation platform does not have trap representations for type int, either, therefore I can use an uninitialized int variable and expect an unspecified value (a much better prospect than undefined behavior)”, one may think. This line of reasoning is attractive. It could even explain the behavior shown in Xi Wang's blog post and reproduced in simplified form below:

$ cat i.c
int f(int x)
{
  int u;
  return u ^ x;
}
$ gcc -O2 -std=c99 -S -fomit-frame-pointer i.c
$ cat i.s
…
_f:
Leh_func_begin1:
	ret
…

On this 64-bit platform, the argument x passed to f() is in register %edi, and the result of f() is expected in register %eax. Thus, by executing instruction ret directly, function f() is not even giving us back the entropy we provided it. It is instead giving us the current contents of %eax, which may not be random at all.

(Giving us back the entropy we passed to it would have been mov %edi, %eax followed by ret, a longer sequence.)

One may argue that the compiler has only opportunistically chosen the most convenient value for variable u, that is, x xored with the current contents of %eax, so that u xored with x is just the current contents of register %eax. This fits the interpretation of “unspecified value” for C99's definition of “indeterminate contents”. It is a good argument, but just wait until you have seen the next example.

The next example

#include <stdio.h>

int main(int c, char **v)
{
  unsigned int j;
  if (c==4) 
    j = 1; 
  else
    j *= 2;
  printf("j:%u ",j);
  printf("c:%d\n",c);
}

If fewer than three command-line arguments are passed to the program, it should display an unspecified even number for j, right?

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
…
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) 
$ gcc -O2 t.c
$ ./a.out 
j:1 c:1

GCC version 4.4.3 has decided that since the “else” branch was reading from an uninitialized variable j, only the “then” branch was worth compiling. This is acceptable if reading uninitialized variable j is undefined behavior, but not if it is unspecified behavior. Let us insist:

$ gcc -Wall -O2 -std=c99 t.c
$ ./a.out 
j:1 c:1

Although we are requesting the C99 standard to be followed by GCC, the program is not printing for variable j the even unspecified value we are entitled to.

(In passing, a proper static analyzer would know that if it is going to show variable j as containing 1, it might as well show c as containing 4. Also, a proper static analyzer would remind you that your program must, in essence, only be used with three command-line arguments. The reason compilers do not do this is covered elsewhere)

Between 1999 and 2011, C standards did not get shorter

In 2007, Rich Peterson, working at HP, was disappointed to find that the “Not a Thing” (NaT) value that registers can have on the Itanium architecture could not be used to implement an uninitialized unsigned char.

One thing led to another, and the C11 standard was amended with the phrase “If the lvalue designates an object of automatic storage duration that could have been declared with register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer, and no assignment to it has been performed prior to the use), the behavior is undefined.”

That would have been my reaction too, if I was paid by the word. Anyway, this additional sentence re-introduces undefined behavior where there was none in C99.


In the example above, the address of j was never taken, so maybe that's GCC's excuse. Let us check:

#include <stdio.h>

int main(int c, char **v)
{
  unsigned int j;
  unsigned int *p = &j;
  if (c==4) 
    j = 1; 
  else
    j *= 2;
  printf("j:%u ",j);
  printf("c:%d\n",c);
}

$ gcc -O2 t.c
$ ./a.out 
j:1 c:1

No, GCC is still acting as if j *= 2; was undefined.

Conclusion

I am not saying that this is not a bug in GCC. Perhaps it was fixed in later versions (in fact, that version does not accept -std=c11, so it must be rather old). My thesis is that you might as well avoid reading from uninitialized variables as if it was undefined behavior, because otherwise compilers will bite you. This statement holds even if what we have witnessed here is a bug in GCC version 4.4.3.


Also, if this is a bug in GCC 4.4.3, this is the first time I identify a bug in GCC without the assistance of a random program generator. In other words, compiler bugs are rare, but they become surprisingly common if you stick to a strict interpretation of a necessarily ambiguous standard. And speaking of Csmith, if there is indeed a GCC bug here, said bug cannot be detected with Csmith, which does not generate programs like mine.

Thursday, January 24 2013

Customers, customers, customers

The recent posts on extremely minor undefined behaviors in zlib neatly tie in with a discussion on John Regehr's blog about the future-proofitude of C and C++.


Another insightful post in this regard is this one by Derek Jones. Derek claims that the situation is different for proprietary compilers with paying customers. The argument rings true to me. The only proprietary compiler I know that competes with GCC or Clang in terms of aggressive optimization (at the cost of breakage of existing code) is the Intel C++ compiler. But that is not a counter-example: I did not pay for icc, nor do I know anyone who pays for it in order to use it in production.

Wednesday, January 16 2013

Bad zlib, bad! No compare pointer!

In a previous post, we remarked that the decompression function of zlib, for some inputs, computes an invalid pointer. But at least it neither dereferences it nor compares it to another pointer.

Or does it?

Recipe for an invalid pointer comparison

Instrument

Take an ordinary zlib library version 1.2.7, and instrument it according to the diff below.

$ diff -u zlib-1.2.7/inffast.c zlib-modified/inffast.c
--- zlib-1.2.7/inffast.c	2010-04-19 06:16:23.000000000 +0200
+++ zlib-modified/inffast.c	2013-01-16 23:37:55.000000000 +0100
@@ -64,6 +64,9 @@
       requires strm->avail_out >= 258 for each loop to avoid checking for
       output space.
  */
+
+int printf(const char *fmt, ...);
+
 void ZLIB_INTERNAL inflate_fast(strm, start)
 z_streamp strm;
 unsigned start;         /* inflate()'s starting value for strm->avail_out */
@@ -316,6 +319,7 @@
     strm->next_in = in + OFF;
     strm->next_out = out + OFF;
     strm->avail_in = (unsigned)(in < last ? 5 + (last - in) : 5 - (in - last));
+    printf("out=%p, end=%p\n", out, end);
     strm->avail_out = (unsigned)(out < end ?
                                  257 + (end - out) : 257 - (out - end));
     state->hold = hold;

This instrumentation causes two pointers, out and end, that are compared at that point of the library, to be printed. Apart from that, it does not change the behavior of the library. The library behaves the same when the printf() call is not there, albeit less observably.

Following the documentation, write an ordinary main()

Prepare a main() function that uncompresses a buffer deflated_buffer more or less according to the official tutorial:

main(){
  unsigned char out_buffer[CHUNK_OUT];

  printf("out_buffer: %p\n", out_buffer);

  /* allocate inflate state */
  ...

  ret = inflateInit(&my_strm);

  if (ret != Z_OK)
    return ret;

  my_strm.next_in = deflated_buffer;
  my_strm.avail_in = 40;
  my_strm.next_out = out_buffer;
  my_strm.avail_out = CHUNK_OUT;
	
  ret = inflate(&my_strm, Z_FINISH);

  ...
}

You can download the file zlib_UB.c.

Secret ingredient

Season with a specially crafted deflated buffer:

unsigned char deflated_buffer[40] = {
  120,
  156,
  67,
  84,
...

Test

Having built the instrumented zlib, you can compile and link the main() function:

$ gcc -I Downloads/zlib-modified/ zlib_UB.c Downloads/zlib-modified/libz.a
$ ./a.out 
out_buffer: 0x7fff5bd9fa34
out=0x7fff5bd9fa33, end=0x7fff5bd9fa5e

Although we followed the documentation when writing the main() function that calls it, zlib appears to be, at inffast.c:320, comparing a pointer out that points before out_buffer to a pointer end that points inside out_buffer. In fact, the pointer out has been computed as an offset of out_buffer, namely, out_buffer - 1. Zlib is not supposed to do this, for two reasons:

  1. The compiler could place out_buffer at address 0, or at the beginning of a segment in a segmented architecture. Then out_buffer - 1 would evaluate to 0xffffffffffff and appear to be larger than end, or cause a hardware exception. To be fair, in general the same value still ends up being stored in strm->avail_out, because both branches of the expression (out < end ? 257 + (end - out) : 257 - (out - end)) compute the same thing.
  2. Even without the array out_buffer being placed at address 0—which is rather unlikely—the compiler could suddenly become too smart for its own good and generate code that treats the condition out < end as false in this case. This has happened before. Key quote: “Some versions of gcc may silently discard certain checks for overflow. Applications compiled with these versions of gcc may be vulnerable to buffer overflows.” In this case, the invalid pointer out_buffer - 1 would be interfering with a compiler optimization, so anything might happen. Since it is strm->avail_out the statement is computing, a buffer overflow might result.

Fix

This problem and the previously described one have the same cause; they stem from the following optimization in inffast.c:

#ifdef POSTINC
#  define OFF 0
#  define PUP(a) *(a)++
#else
#  define OFF 1
#  define PUP(a) *++(a)
#endif

Defining POSTINC makes both the minor issues go away.

Conclusion

The defect identified here is related to the previous one, and like it, it is probably innocuous. Nevertheless, we initially set out to formally verify zlib, so we should stick to it: a formally verified zlib wouldn't compute negative offsets of arrays, much less compare them to valid offsets of the same array. So far, I have not been able to confirm any of the potential issues listed by Frama-C's value analysis in zlib with POSTINC defined, so that version of the library may reveal itself to be the formally verified version we are after.


This post was proofread by Fabrice Derepas and Olivier Gay.

Saturday, November 17 2012

Compiler-driven language development

A quiz

What is pressing “return” next below going to reveal GCC has done wrong?

$ cat t.c
#include <limits.h>
int x = -1;

int main(int c, char **v) {
  x = INT_MIN % x;
  return x;
}
~ $ gcc -std=c99 t.c
~ $ ./a.out

Answer:

$ ./a.out 
Floating point exception

The programmer was using an x86_64 processor

GCC has compiled a perfectly well-defined C99 program for returning 0 into binary code that makes an error. The error is misdiagnosed as a floating point exception but is actually a division overflow. It happens because computing the C99 remainder operation % on this computer invokes the x86_64 instruction idivl, which can raise this error when invoked with the arguments passed to it in this program. The idivl instruction computes a quotient and a remainder at the same time; the overflow error relates to the computation of the quotient, which my program was going to throw away anyway.


My program was well-defined in C90, too. In both these versions of the C standard, the semantics of %, like other arithmetic operations, are described informally as first computing the result as unbounded integer and then, since the result 0 fits the signed destination type int, the operation is defined and the result is 0.

Over the period of time it has been generating code for x86 computers, GCC could have become standard-compliant by adding the couple of instructions necessary to compute the standard-mandated result. The most efficient sequence I see would be to test the divisor for equality with -1 and to conditionally move if equal the divisor into the dividend. This would compute -1 % -1 instead of dividend % -1, always producing the same result 0 without the need for an expensive branch.

GCC would not even have needed to generate this code all the time. Most of the times the divisor is statically known not to be -1 or the dividend is statically known not to be INT_MIN. In either case the guarding test is unnecessary.

To be fair, GCC can generate sophisticated assembly code when it exerts itself. If a program uses both dividend % divisor and dividend / divisor nearby in the same function, the generated assembly code is likely to call the division instruction idivl only once. My proposed conditional move would interfere with this optimization when it applies.

What it must be like to standardize a language like C

Good news! GCC held out, and now (in the C11 standard), INT_MIN % -1 is officially undefined behavior, so GCC is allowed to do what it wants. This goes to show two things:

  1. Revisions of the C standard are carefully crafted to allow as many compliant programs as possible to continue to be considered compliant, but this is only one of several conflicting objectives. My program is C99-compliant and is undefined in C11.
  2. Sometimes the standard defines the future direction of the language (e.g. the memory model part of C11) but sometimes it only ratifies whatever non-compliant semantics compilers have been implementing for years.

Wednesday, July 25 2012

The previous post was written in jest

Just a quick update to provide context for the previous post.


The previous post assumes the reader is familiar with the notion of undefined behavior and how C compilers have started to justify their more aggressive optimizations along the lines of “this program has always been undefined”.

Long ago, a C programmer trying to write a portable C program had to avoid signed overflows because different architectures could exhibit different behaviors depending on whether signed numbers were represented as sign-magnitude, in one's complement or in two's complement. These days are gone. A few dedicated architectures (DSP) may give you saturating behavior for overflows, but mostly every architecture use two's complement.

A modern programmer might think “I know my target architecture uses two's complement representation, and I want wrap-around behavior here, so I will just use signed addition”.

The problem nowadays is that compilers take advantage of undefined behavior for optimization. An expression such as X+1 > X with X of type int may be optimized to 1, because the only case when it is not true is when X+1 overflows, which is undefined behavior, and therefore the compiler can do what it wants then. Incidentally, this means the same compiler compiles X+1 > X into code that produces different results for INT_MAX on different optimization levels.


The previous post suggested to use a statement that is undefined when p and q alias in order to free the compiler of any constraints in these circumstances. The compiler can effectively do anything it wants, including returning the incorrect result 3, if the same address is passed as both arguments of f2(), because the function is undefined then. This was not a serious suggestion, but should be understood as an argument in the debate about the exploitation of undefined behavior in compiler optimization.


This debate is interesting to me as someone who works on the static analysis of critical embedded C, because embedded code has constraints that make some kinds of undefined behaviors unavoidable. And critical code shouldn't use any of the dangerous undefined behaviors. And the frontier between the unavoidable undefined behaviors and the dangerous undefined behaviors is not written anywhere, and it is always moving.


The previous post was largely written in jest. I was not suggesting to substitute the respected restrict keyword with an awkward replacement. All the arguments I put forward were in bad faith.

On the redundancy of C99's restrict

The restrict keyword in C99

C99 introduced a restrict keyword. The intention is to let the programmer specify the absence of alias between some inputs of a function ey is writing.

Consider the function:

int f1(int * restrict p, int * restrict q)
{
  *p = 1;
  *q = 2;
  return *p + *q;
}

Thanks to the restrict keyword, GCC can compile function f1() into the following assembly code:

$ ~/gcc-172652/bin/gcc -O3 -std=c99 -S restr.c
$ cat restr.s
...	
	movl	$1, (%rdi)
	movl	$2, (%rsi)
	movl	$3, %eax
	ret
...

Note that the -std=c99 is necessary. The code, parsed as a C89 program, is syntactically incorrect because restrict is not a keyword in C89.

In the generated x86_64 code, %rdi is the first argument p, %rsi is the second argument q, the parentheses dereference and the dollar sign indicates constants. By convention, register %eax contains the return value when the ret instruction is reached.

The above is pretty good code. Since the information is provided that pointers passed to function f1 for p and q never alias, the compiler knows that at the point of the return, *p must be 1, and therefore the returned expression evaluates to 3.

Contrast with the same function compiled without the restrict annotations, say, because all you have is a C89 compiler:

int f(int * p, int * q)
{
  *p = 1;
  *q = 2;
  return *p + *q;
}

...
	movl	$1, (%rdi)
	movl	$2, (%rsi)
	movl	(%rdi), %eax
	addl	$2, %eax
	ret

This generated assembly code is not as good. The compiler knows that *q can only be 2 at the point of the return statement, but it cannot be certain that *p contains 1, because the function could be passed the same address for both arguments. In order for the code to work in all cases, the compiler decided to reload *p from memory for the addition.

My proposal

I claim that the restrict keyword in C99 both lacks expressive power and is redundant with constructs that already existed in C89. There was no need to encumber the language with a new construct, especially a new keyword that breaks the compilation of existing code that uses a variable named restrict.

C89 already had undefined behavior, a powerful tool for instructing the compiler that it does not have to generate code that handles special cases.


A function f2() that works under the same hypotheses as f1() can be written in C89:

int f2(int * p, int * q)
{
  (*p=*p) & (*q=*q);
  *p = 1;
  *q = 2;
  return *p + *q;
}

The first statement in the body informs the compiler that f2() will never be called with aliasing arguments p and q. The compiler is then free to generate the same code as was generated for the restrict-using function f1(). It is completely legal for the compiler to assume that *p + *q at the return statement evaluates to 3 in the function above. GCC does not generate efficient code for the function f2() above, but it could. It generates the same code as for the plain function f(), correctly compiling (*p=*p) & (*q=*q); to nothing, but failing to take advantage of the freedom it provides to generate a function that does not work when p and q alias.

Wrapping up

My proposal is more flexible than restrict, allowing to specify fine-grained non-aliasing relations between specific pointers. It is slightly more verbose, especially when there are many pointers, but this is a normal downside of allowing to specify aliasing relations pointer by pointer. Note that the pairwise absence of alias between p, q and r can be specified with the compact (*p=*p) & (*q=*q) & (*r=*r);


Compiler makers, please support the (*p=*p) & (*q=*q); code annotation in your compilers.