Frama-C news and ideas

To content | To menu | To search | Frama-C Home

From Pascal strings to Python tuples

Quiz time

What does the program below do?

#include <stdio.h>

int main(){
  struct {
    int t[4];
    int u;
  } v;
  v.u = 3;
  v.t[4] = 4;
  printf("v.u=%d", v.u);
  return 0;
}

Two answers are “it prints v.u=4” and “it prints v.u=3”:

$ gcc t.c && ./a.out 
v.u=4
$ gcc -O2 t.c && ./a.out 
v.u=3

The correct answer is of course that the program invokes undefined behavior. It is not that we are using at any time an lvalue of the wrong type to access memory, breaking the so-called “strict aliasing rules”. It is not that v.t+4 is outside of object v. The problem is that v.t+4 is outside object v.t. So GCC does what it pleases, and when compiling with -O2, optimizes brutally:

$ gcc -S -O2 t.c && cat t.s
.LC0:
	.string	"v.u=%d\n"
…
	movl 	$3, %edx
	movl 	$.LC0, %esi
	movl	 $1, %edi
	xorl	%eax, %eax
	call	__printf_chk

Frama-C's value analysis warns for the above program:

$ frama-c -val t.c
t.c:9:[kernel] warning: accessing out of bounds index {4}. assert 4 < 4;

In general, accessing t[i] when t is an array of size 4 is only valid when i < 4, but here the index is hard-coded as 4, so line 9 is only valid when 4 < 4. That is, never: all executions that reach line 9 encounter undefined behavior there.

Second quiz, same as the first quiz

What does the program below do?

#include "stdlib.h"

typedef struct{
  int tab[1];
} ts;

int main() {
  ts *q = malloc(5*sizeof(int));
  q->tab[2]= 5;
  return 1;
}

If you guessed “invoke undefined behavior”, well done!


The program above was shown to me by facetious colleague Bernard Botella, who is hard at work analyzing Python 2.7.4's runtime in the context of a project named SafePython. The snippet above is his reduced version of a larger piece of C code he found there. The issue Bernard was having started with the type definition below, and I will let you guess the rest:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[1];

   /* ob_item contains space for 'ob_size' elements.
    * Items must normally not be NULL, except during construction when
    * the tuple is not yet visible outside the function that builds it.
    */
} PyTupleObject;

In C90, the “array of size 1 as last member of a struct” was a common idiom for implementing things like Pascal strings. And of course it is just as valid for variable-length tuples. The problem is that this is not 1990 any more: compilers now use undefined behavior as an excuse to optimize aggressively, and the idiom is no longer valid at all for either tuples or Pascal strings. On the plus side, in the C99 standard we got “incomplete types”, a safe way to implement tuples and Pascal strings:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[];
…

Conclusion

I have encouraged my colleague Bernard to report the above as a bug in Python. This kind of bug report is usually ignored, because it denounces idioms that programmers have used for a long time and that they think they understand. Just remember: if you think you can predict what the program in the second quiz does, you should be able to predict what the program in the first quiz does (or explain what is different about it).

Comments

1. On Thursday, August 1 2013, 09:22 by David Mentré

Thanks Pascal, I'll reuse them!

d.

2. On Thursday, August 1 2013, 09:51 by Matthieu

It is worth noting that this idiom is ubiquitous. It was even "promoted" by the GCC developpers, through the zero-length array extension:
http://gcc.gnu.org/onlinedocs/gcc/Z...

However, it seems that people are moving to the standard conforming solution; for instance, the POSIX directory entry structure used this idiom (http://www.delorie.com/gnu/docs/dir...), but newer standards have switched to C99: http://pubs.opengroup.org/onlinepub...

One important issue is that C99 support is poor in most compilers, so the old behaviour still has to be used (preferably optionally, using a macro) in portable code.

3. On Thursday, August 1 2013, 14:52 by pascal

@Matthieu

I see the zero-length array GCC extension to C90 as recognition by the GCC developers that something special needed to be done when the intention was “an array of unknown length, e.g. as the last member of a struct”. Since the standard forbade and forbids arrays of size 0, this extension was minimally disruptive.

Frama-C warns for v.t[4] but allows int *p = v.t; p+= 4; *p. Choosing what to allow and what to reject is a very difficult line to walk, but in this case, Frama-C seems to match quite precisely what the compiler allows itself to optimize or not:

~ $ cat t.c

#include <stdio.h>

int main(){

  struct {

    int t[4];

    int u;

  } v;

  v.u = 3;

  int *p= v.t;

  p += 4;

  *p = 4;

  printf("v.u=%d\n", v.u);

  return 0;

}

~ $ gcc -O2 t.c && ./a.out 

v.u=4