Frama-C news and ideas

Easy value analysis example: putnum()

If some of the posts in this blog ever get re-organized into a course-style document, this one will be near the beginning, because it's simple and self-contained.

The program

This example was offered on our bug-tracking system:

```int putchar(int);

void print(const char *ptr)
{
while (*ptr)
putchar(*ptr++);
}

void putnum(unsigned long n)
{
char buf[10], *ptr = buf + sizeof(buf);
*--ptr = 0;
do {
*--ptr = "0123456789ABCDEF"[n&15];
n >>= 4;
} while(n);
print(ptr);
}
```

A few minutes of looking at the code should allow you to infer for yourself that function `putnum()` is for printing unsigned hexadecimal numbers.

The precision issue

The reporter was disappointed that the value analysis emitted a number of alarms for this code, which seems trivial. This brings to mind the quip by Richard Feynman that "mathematicians can prove only trivial theorems, because every theorem that's proved is trivial". Excuse the approximate quote: this is another book that I sop up in audio form, and these things are even worse than paper for searching. But perhaps the case deserves to be called trivial indeed, because in the original program, as provided by the reporter in the BTS, `putnum()` is called only with a single, concrete value:

```int main(void)
{
putnum(0x12345);
putchar('\n');
return 0;
}
```

This should help the value analysis: it likes to analyze functions with only the values for arguments and globals that appear to be really passed to them at execution. However, remember that most classic, bug-finding static analyzers would analyze function `putnum()` in abstracto, and tell you what they think pertinent to tell you regardless of whether the function is called only with `0x12345`, or indeed, never called at all — and therefore, like the plane that always stays on the ground, is extremely safe.

Back to the value analysis. What are these alarms that it finds?

```\$ frama-c -val putnum.c
...
putnum.c:16:[kernel] warning: out of bounds write. assert \valid(ptr);
putnum.c:7:[kernel] warning: accessing uninitialized left-value *ptr: assert(Ook)
putnum.c:8:[kernel] warning: accessing uninitialized left-value *tmp: assert(Ook)
putnum.c:7:[kernel] warning: out of bounds read. assert \valid(ptr);
putnum.c:8:[kernel] warning: out of bounds read. assert \valid(tmp);
...
```

Fair enough. The value analysis does emit a number of alarms. Some of them are nearly duplicates. For instance, variable `tmp` is only, in function `print()`, the temporary introduced in the translation of `*ptr++`. It's by design that the analysis no longer even tries to avoid multiple warnings in this case. On the whole, it does limit its study of function `putnum()` to the case where `n` is `0x12345`. The imprecision comes from somewhere else: the loop inside `putnum()`. By default, the analyzer summarizes the loop in an unfavorable way. Variable `ptr` is only inferred to range over `{{ &buf + [-1..8] }}` at the point where it's dereferenced, so it is not guaranteed to be valid:

Since the analyzer doesn't know which `buf[]` element exactly is written to, all it can say is that each element between 0 and 8 may be `[48..70] or UNINITIALIZED`:

Note that `48` is `'0'` and `70` is `'F'`, so it is not doing too bad, really, despite emitting all these alarms.

The solution

What's the step after checking what library functions are used in the program and thinking how to offer equivalent source code to the analyzer? Ah, yes, it is: "if the analysis is fast enough, use option `-slevel` to make it more precise".

```\$ frama-c -val putnum.c -slevel 50
...
[value] Values for function putnum:
n ∈ {0; }
buf[0..3] ∈ UNINITIALIZED
[4] ∈ {49}
[5] ∈ {50}
[6] ∈ {51}
[7] ∈ {52}
[8] ∈ {53}
[9] ∈ {0}
ptr ∈ {{ &buf + {4} }}
```

The alarms all disappear. We can make sure this is not caused by a freakish accident but because the value analysis understand exactly what goes on during execution of this program: the hexadecimal representation of `0x12345` in the contents of array `buf[]`, in a snapshot taken at the end of `putnum()`.

Generalizing

The exercise can be made more interesting by considering the safety of `putnum()` when called with an unknown argument:

```unsigned long unknown();

int main(void)
{
putnum(unknown());
putchar('\n');
return 0;
}
```

I have to admit I made a mistake here when replying to the bug report: I did not try it. Biased by hours spent on difficult case studies, I thought that the value analysis would need to have different execution paths (where the program is valid for slightly different reasons) manually separated in order to conclude that the program is safe. Something like the following hint, placed in a strategic location, according to the principles in this post.

```/*@ assert n <= 15 || 16 <= n <= 65535 || 65536 <= n <= ... ;
```

Such an annotation, typical of those needed for a larger-scale value-analysis-based verification, is not required here (but it does not hurt). Even without it, the value analysis separates the cases where the number is one-digit, two-digits, etc. long by itself, in few millions of processor cycles — that is, in a fraction of a second:

```\$ frama-c -val putnum.c -slevel 500
...
[value] Values for function putnum:
n ∈ {0; }
buf[0] ∈ UNINITIALIZED
[1..7] ∈ [48..70] or UNINITIALIZED
[8] ∈ [48..70]
[9] ∈ {0}
ptr ∈ {{ &buf + [1..8] }}
```

For an annotationless analysis, a lot of information is contained in this snapshot of the program's state. The value analysis is strongly suggesting that it is safe to reduce the size of `buf[]` by one, since `ptr` never goes lower than `buf+1` and `buf[0]` remains uninitialized for all possible `putnum()` arguments. All numbers are printed with at least one digit (`buf[8]` is guaranteed to be initialized and in the range '0'..'F', and `ptr` points no higher than `buf[8]`). On the other hand, elements 1 to 7 are not guaranteed to be written to.

But wait: the value analysis seems to be saying that it's safe to reduce the size of `buf[]` by one, but perhaps this code is not intended only for 32-bit architectures? Perhaps the additional element serves to make the function safe on a 64-bit architecture. Let us try it out:

```\$ frama-c -val putnum.c -slevel 500 -machdep x86_64
...
putnum.c:16:[kernel] warning: out of bounds write. assert \valid(ptr);
...
[value] Values for function putnum:
n ∈ {0; }
buf[0..7] ∈ [48..70] or UNINITIALIZED
[8] ∈ [48..70]
[9] ∈ {0}
ptr ∈ {{ &buf + [0..8] }}
```

On a 64-bit architecture, all the elements of `buf[]` seem to be useful, but there may be a problem with `ptr` at line 16. Determining whether this is a true or false alarm will be left as an exercise to the reader (download file putnum.c).

Zaynah Dargaye provided insights based on an early version of this post.