Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Friday, July 27 2012

Oxygen is stricter about types and why you should get used to it

I have just sent a list of changewishes (1, 2) to a static analysis competition mailing-list, and that reminded me of a blog post I had to write on the strictness of the type-checker in upcoming Frama-C release Oxygen. This is the blog post.

This post is not about uninitialized variables

The static analysis competition in question evaluates each participating tool on many testcases. A good number of these testcases are rejected by the current Frama-C. This is why I was writing to the mailing-list: Frama-C cannot participate in the competition if the testcases are not changed (Frama-C does not have to participate, of course. It's just that it looks fun and we would probably like to be in it).

In fact, many of the testcases were already problematic for Frama-C version 20111001 (Nitrogen), the version in the current Debian, Ubuntu, FreeBSD, NetBSD and other Unix distributions. Indeed, a lot of the testcases rely on uninitialized variables for entropy, which this post by Xi Wang and this post by Mark Shroyer show is wrong. Instead of the problem that is supposed to be detected (or not), Nitrogen detects the uninitialized use. I covered this already; this blog post is not about uninitialized variables (keep reading!).

This post is about C type-checking

While trying to get a list of uninitialized-variable-using testcases, I realized that something had changed since my last evaluation of the competition's benchmarks. Many of them were now rejected at type-checking!

The new problem is, many testcases in the benchmarks call functions without having provided a prototype, and some ultimately define the called function with a type incompatible with the type inferred at the call site. Frama-C used to be lenient about typing issues, but after fixing one soundness bug too many that was caused by the leniency, we have decided to throw in the towel. Virgile described one of the typing issues for which Frama-C used to be too lenient. It was not the only one.

This is an unusual position to take. In the article A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World, the makers of a commercially successful bug-finding tool state:

Our product's requirements roughly form a "least common denominator" set needed by any tool that uses non-trivial analysis to check large amounts of code across many organizations; the tool must find and parse the code, [...]

The above is wise: to be maximally useful, the static analyzer should choke on as few idiosyncrasies as possible. The analyzed code can be assumed to compile with some compiler, and to go through some tests (there must still be a few of these, right)? Why warn when a function is called without a prototype, if the compiler accepted it? Why reject when the function's implementation has a type that is incompatible with the type that was inferred at the call site? This is probably not the issue the user is looking for (if it was, the user would have fixed it when eir compiler warned for it).

Oxygen is strict and that's it

Well, our answer is that we tried and we found that it was too much work to try to be sound and precise with these constraints, as exemplified by Virgile's blog post. Show us a tool that accepts your weakly typed program, and we will show you a tool that probably isn't sound and precise at the same time (we have kept the examples that led to the decision. These examples demonstrate real issues masked by lenient typing. If you think you have found a sound analyzer that is at the same time conveniently permissive on typing, it will be our pleasure to provide the examples for you to try).


We hope that Frama-C will still be useful with these new restrictions on typing. Fortunately for us, there are more real worlds than the expression “the Real World” in the cited article's title might lead you to think (and this quip should not be taken as a reproach towards the authors of “A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World”, a very useful document with an appropriately chosen title considering its intended public).

Tuesday, May 22 2012

Iterating over the AST

Context

A facetious colleague who claims that he should be better writing his thesis but keeps committing Coq and OCaml files on Frama-C's repository, asked me the following question: Is there a function in Frama-C's kernel that can fold[1] a function over all types that appear in a C program?

As it turns out, nothing is readily available for this exact purpose, but another facetious colleague could have come up with a terse answer: There's a visitor for that! Now, we can be a bit more helpful and actually write that visitor. Our end goal is to come up with a function

fold_typ: (Cil_types.typ -> 'a -> 'a) -> 'a -> 'a

so that fold_typ f init will be f t_1 (f t_2 (... (f t_n init)...)), where the t_i are all the C types appearing in a given C program (in no particular order).

Frama-C's visitors

The visitor pattern is a well-known object-oriented design pattern that can be used to perform a given action over all nodes of a complex data structure (in our case the Abstract Syntax Tree -AST for short- of a C program). Frama-C provides a generic visitor mechanism, built upon CIL's visitor, whose entry points can be found in the aptly named src/kernel/visitor.mli file. It is also documented in section 5.14 of the developer manual. In summary, you first define a class that inherits from the generic visitor and overrides the methods corresponding to the nodes you're interested in (in our case, this will be vtype for visiting uses of Cil_types.typ). Then you apply an object of this class to the function from the Visitor module that correspond to the subpart of the AST that you want to inspect (in our case, this will be the whole AST).

Basic version

The standard visitor does not provide anything to return a value outside of the AST. In fact, all the entry points in Visitor return a node of the same type that the node in which the visit starts (for visitors that don't perform code transformation, this is in fact physically the same node). But if you accept to sneak in a little bit of imperative code into your development -and by using Frama-C you've already accepted that- there is an easy way out: pass to the visitor a reference that it can update, and you just have to read the final value that reference is holding after the visit. The visitor then looks like the following:

class fold_typ_basic f acc =
object
  inherit Visitor.frama_c_inplace
  method vtype ty = acc:= f ty !acc; Cil.DoChildren
end

And that's it. Each time the visitor sees a type, it will apply f to ty and the result of the previous computations, stored in acc. fold_typ then just needs to call the visitor over the whole AST and give it a reference initialized with init:

let fold_typ f init =
  let racc = ref init in
  let vis = new fold_typ_basic f racc in
  Visitor.visitFramacFileSameGlobals vis (Ast.get());
  !racc

Don't do the same work twice

This first version, that is barely more than 10 LoC, works, but we can do a little better. Indeed, f will be called each time a type is encountered in the AST. In most cases, we want to call f once for any given type. This can be done quite simply by memoizing in an instance variable of our visitor the set of types encountered thus far. Frama-C's Cil_datatype module (cil/src/cil_datatype.mli) provides all the needed functions for that:

class fold_typ f acc =
object
  inherit Visitor.frama_c_inplace
  val mutable known_types = Cil_datatype.Typ.Set.empty
  method vtype ty =
    if Cil_datatype.Typ.Set.mem ty known_types then Cil.DoChildren
    else begin
      known_types <- Cil_datatype.Typ.Set.add ty known_types;
      acc:= f ty !acc;
      Cil.DoChildren
    end
end

Testing the infrastructure

It is now time to test if everything works smoothly. The following function will print the name and size of the type who has the biggest size in the analyzed program.

let test () =
  let f ty (maxty,maxsize as acc) =
    try
      let size = Cil.sizeOf_int ty in
      if size > maxsize then (ty,size) else acc
    with Cil.SizeOfError _ -> acc
  in
  let (ty,size) = fold_typ f (Cil.voidType,0) in
  Format.printf "Biggest type is %a,@ with size %d@." 
    !Ast_printer.d_type ty size

Since it is only a quick test, we don't do anything special if Cil complains that it cannot compute the size of a given type: we just stick to the maximal value computed so far.

File fold_typ.ml provides the code for the visitor and the test function. frama-c -load-script fold_typ.ml file.c should output something like

[kernel] preprocessing with "gcc -C -E -I.  file.c"
Biggest type is struct Ts,
with size 44

Exercises

  1. How can you use the fold_typ class to define an iter_typ function that apply a function f returning unit to each type of the AST (val iter_typ: (Cil_types.typ -> unit) -> unit)?
  2. Writing fold_typ is a bit overkill if you're going to apply it once in your plug-in. Write a specialized visitor that will do the same thing as the test function above.

Notes

[1] For those who are not familiar with functional programming: http://en.wikipedia.org/wiki/Fold_%...

Monday, April 11 2011

C99 promotion rules: what ?!

I reported bug 785 after getting three-quarters sure that the problem was in Frama-C (I used two different versions of GCC as oracles).

I still cannot believe how twisted integer promotion rules are in C99. I had read them before, but I had not followed the precise path I followed today in the various cases of 6.3.1.8.

The fun starts earlier, in 6.3.1.3, with the definition of conversion rank.

rank(long long int) > rank(long int), even when these two types have the same size. But:

The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width.

So if a compiler defines an extended long long long type, programs that use them will become differently typed the day long long long becomes a standard type. Until it is standardized, it has lower rank than a standardized type of the same size. When it becomes standardized, it will probably get a higher rank than other standardized types of the same size.


Well, never mind this: it is unclear whether this can have any effect on the computations in practice (it matters only when types have exactly the same size). Let's see what in 6.3.1.8 applies to bug 785, starting with the promotions applicable to 0x090E7AF82577C8A6LL | x9[1][0].

We have a `long long` on the left-hand side, an `unsigned long` on the right-hand side. Oh, and in this platform description, `long` and `long long` are both the same size, 64-bit.

  1. blah blah "If both operands have the same type" nope
  2. "if both operands have signed integer types or both have unsigned integer types" nope
  3. "if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand" nope
  4. "if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type" nope, it's missing half of them.
  5. "Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type."

There is a philosophy hidden in there. The philosophy seems to be "favor unsigned types, unless the signed type makes more sense". Even a signed type of higher hank may see its values converted to the unsigned version of the same type, changing the meaning of half of them. The rank matters only when it confirms that the unsigned type should be favored. You may be forgiven for wondering why we bother with ranks at all.


In conclusion, both 0x090E7AF82577C8A6LL and x9[1][0] should be promoted to unsigned long long, which shall also be the type of the result. This is not what Carbon does.

Sunday, November 21 2010

IEEE 754 single-precision numbers in Frama-C

Every once in a while, someone asks about single-precision floating-point support in Frama-C. Until recently it was often in the context of the value analysis, but actually, thanks to a lot of interesting new results obtained in the context of this project, people working on deductive verification within Frama-C can now look forward to being harassed too.

We are not nearly home and dry yet, though. Frama-C's front-end is CIL (no relation to something from Microsoft with a similar acronym). CIL is written in OCaml (the rest of Frama-C is written in OCaml too, but never mind that). OCaml has a single floating-point type, which corresponds to IEEE 754 double precision.

So guess how floating-point literal constants are represented in the abstract syntax tree built by CIL...

If you guessed "as an IEEE 754 double-precision number", you are partly right. Yay! But also as a string. CIL's authors did think about this, especially since one of CIL's first use was for source-to-source transformation. Specifically, you have all the information in this constructor defined in file frama-c/cil/src/cil_types.mli:

| CReal of float * fkind * string option
    (** Floating point constant. Give the fkind (see ISO 6.4.4.2) and also
     * the textual representation, if available. *)

The above is OCaml code, and the type float means double-precision floating-point number.

So far so good. If there is a 10.0 literal in the program, it stands for the double 10.0. If there is a 0.1 in the program, it stands for one of representable numbers no further than 1ulp from the real 0.1. This is slightly annoying because the representable number chosen by CIL may not be the same as the representable number chosen by your actual C compiler, but in all likelihood both will round the number from the source code to the nearest double-precision floating-point number. Thus the AST analyzed in Frama-C accurately represents the compiled program and all is well.

The picture gets muddied a little if the target program contains a literal such as 3.14f. This is supposed to represent a single-precision floating-point literal in C. I'm willing to bet without really looking that CIL represents that literal in the AST with the double-precision number nearest to the real number written in decimal in the target program. If you are a Frama-C plug-in author willing to do the right thing, you may think of rounding this double-precision number to the nearest single-precision number yourself, but that doesn't work: the double rounding can make your final result different from the compiler's, which will round only once from real to single-precision. To be correct, you have to go back to the string representation that has been saved from the source code.

In fact, Frama-C's front-end should be doing this for all plug-ins, of course. When the literal is single-precision, it should represent it in the AST as an OCaml float containing the single-precision number meant by the programmer (all single-precision numbers can be represented as double-precision numbers). Probably this will be done soon, now that the problem is identified. This was just an example of how far support for single-precision numbers in Frama-C is.

Full disclosure: I said I was betting without looking but I wouldn't take the bet if I were you. I do not just risk public humiliation like that without strong presumptions. First, I was just working on something related and visible effects of this subtlety came up as one annoying unforeseen issue. Secondly, there is no function in OCaml to parse a number as single-precision. There are some OCaml functions to parse double-precision numbers, and this task is enough of a bother that these are defined by calling the C function strtod(). The way to fix the front-end is probably to replicate this OCaml-C interface for strtof(). So my bet is in fact that CIL's authors neither implemented a single-precision parsing function in OCaml nor interfaced it from C.