Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - developer

Entries feed - Comments feed

Wednesday, September 19 2012

Never forget to sanitize your input

This post is a follow up of this one

A facetious colleague pointed out to me that the print_stmt function that is used to display the CFG in the post mentioned above behaves incorrectly when used over code that include string constants, such as the one below:

void f(const char *);

void test(int c) {
  if ("A random string"[c]=='s') f("Hello, world"); else f("Goodbye");
}

Namely, poor dot is a little bit confused by all these double quotes occurring in labels that are themselves delimited by double quotes. It manages to output something, but the result is not really satisfying: cfg_string_cst.png

Thus, it looks as though the first improvement to our little script will be to have a more robust pretty-printing function for statements. Basically, we have to escape those pesky double quotes that might appear when pretty-printing a C expression. Fortunately, OCaml's Printf (hence Format) module does know how to do this, with the "%S" directive. There's one drawback, though: this directive will also put (unescaped) double quotes at the beginning and end of the result, that we'll have to get rid of. In the end, the escaping function is the following:

let protect pretty chan v =
  let s = Pretty_utils.sfprintf "%a" pretty v in
  let s = Pretty_utils.sfprintf "%S" s in
  Format.pp_print_string chan (String.sub s 1 (String.length s - 2))

First, we create a string containing the the unescaped output. Pretty_utils.sfprintf is a Frama-C function that is analogous to Format.sprintf except that 1) it can be fed with the same pretty-printing functions as fprintf when there is a custom "%a" directive and 2) each call uses its own internal buffer, avoiding subtle flushing issues. Second, we escape all the inner double quotes. Finally, we output the result, except for the first and last characters that by definition of "%S" are double quotes.

We can now use protect in print_stmt each time there might be an issue (the complete file is available here):

 let print_stmt out =
  function
-   | Instr i -> !Ast_printer.d_instr out i
+   | Instr i -> protect !Ast_printer.d_instr out i
...
-   | If (e,_,_,_) -> Format.fprintf out "if %a" !Ast_printer.d_exp e
-   | Switch(e,_,_,_) -> Format.fprintf out "switch %a" !Ast_printer.d_exp e
+   | If (e,_,_,_) -> Format.fprintf out "if %a" (protect !Ast_printer.d_exp) e
+   | Switch(e,_,_,_) ->
+       Format.fprintf out "switch %a" 
+         (protect !Ast_printer.d_exp) e

Using this new version of the script, frama-c -load-script cfg_print_escape.ml test.c && dot -Tpng -O cfg.out produces a better result: cfg_string_cst_correct.png

Tuesday, September 4 2012

Extracting information from Frama-C programmatically

Extending the framework

Some time ago, one of our fellow users asked us whether it was possible to extract the control-flow graph (CFG) of C functions from Frama-C. Fact is, although the CFG is computed during the elaboration of the AST, nothing exists currently to present the information to the user. But Frama-C is a Flexible framework, and a small (50 LoC) script will provide this functionality albeit in a rather crude form. More specifically, we'll ask Frama-C to output a file in the dot format that will contain a graph for each of the functions of the program under analysis. Basically, our script is composed of three parts: a function to pretty-print the statements as node of the graph, a visitor to create the graphs, and the boilerplate code that takes care of registering the code into Frama-C kernel.

Pretty-printing

Frama-C already has a function to pretty-print statements, but it is not suitable for us, as it will recursively print substatements of compound statements (blocks, if, while, ...), while we only want to have a label for the current statement: substatements will be represented by other nodes. We'll use thus the following small function:

let print_stmt out =
 function
   | Instr i -> !Ast_printer.d_instr out i
   | Return _ -> Format.pp_print_string out "<return>"
   | Goto _ -> Format.pp_print_string out "<goto>"
   | Break _ -> Format.pp_print_string out "<break>"
   | Continue _ -> Format.pp_print_string out "<continue>"
   | If (e,_,_,_) -> Format.fprintf out "if %a" !Ast_printer.d_exp e
   | Switch(e,_,_,_) -> Format.fprintf out "switch %a" !Ast_printer.d_exp e
   | Loop _ -> Format.fprintf out "<loop>"
   | Block _ -> Format.fprintf out "<enter block>"
   | UnspecifiedSequence _ -> Format.fprintf out "<enter unspecified sequence>"
   | TryFinally _ | TryExcept _ -> Format.fprintf out "<try>"

Creating the graphs

In order to create our output, we must make a pass through the whole AST. An easy way to do that is to use CIL's visitor mechanism. There are three kinds of nodes where we have something to do. First, at the file level, we create the whole graph structure:

method vfile f =
    Format.fprintf out "@[<hov 2>digraph cfg {@\n";
    ChangeDoChildrenPost (f,fun f -> Format.fprintf out "}@."; f)

Then, for each function, we encapsulate the CFG in a subgraph (and don't do anything for other globals, hence the SkipChildren for the other branch of the pattern-matching).

  method vglob_aux g =
    match g with
      | GFun(f,loc) ->
          Format.fprintf out "@[<hov 2>subgraph cluster_%a {@\n\
                             graph [label=\"%a\"];@\n" 
            Cil_datatype.Varinfo.pretty f.svar
            Cil_datatype.Varinfo.pretty f.svar;
          ChangeDoChildrenPost([g], fun g -> Format.fprintf out "}@\n@]"; g)
      | _ -> SkipChildren

Last, for each statement, we have to create a node of the graph, and create the edges toward its successors:

  method vstmt_aux s =
    Format.fprintf out "s%d@[[label=\"%a\"]@];@\n" s.sid print_stmt s.skind;
    List.iter 
      (fun succ -> Format.fprintf out "s%d -> s%d;@\n" s.sid succ.sid)
      s.succs;
    DoChildren

Registration within Frama-C's main loop

Now, we want Frama-C to be aware of our script. Basically, this amounts to write a driver function of type unit -> unit that will launch the visitor process, and to tell Frama-C that this function should be run as an analysis:

let run () =
  let chan = open_out "cfg.out" in
  let fmt = Format.formatter_of_out_channel chan in
  Visitor.visitFramacFileSameGlobals (new print_cfg fmt) (Ast.get())

let () = Db.Main.extend run

Launching the script

In order to obtain a cfg.out file with our graphs, we simply have to call frama-c that way:

frama-c -load-script cfg_print.ml [other_options] file.c [file2.c] 

Then, provided graphviz is installed, the command

dot -Tpdf -o cfg.pdf cfg.out

will provide a pdf file similar to this one

What next?

This script is very basic, and should be improved in several ways in order to be useful beyond the point of showing how to interact with Frama-C's API (some of these might be the subject of new posts in the future)

  • The graphs could be fancier, in particular by distinguishing between branching nodes and plain ones, or showing exit of blocks as well as their beginning
  • For now, the graphs are generated each time the script is explicitely loaded from the command line. One might eventually want to compile it as a proper plugin, systematically loaded by Frama-C, with the graph generation commanded by a proper command-line option
  • Some other parameters might be interesting, such as choosing the name of the output file, including only the CFG of certain functions, ...
  • The script could interact with some plugins to give more information (e.g. showing dead paths for a given run of Value)

Tuesday, May 22 2012

Iterating over the AST

Context

A facetious colleague who claims that he should be better writing his thesis but keeps committing Coq and OCaml files on Frama-C's repository, asked me the following question: Is there a function in Frama-C's kernel that can fold[1] a function over all types that appear in a C program?

As it turns out, nothing is readily available for this exact purpose, but another facetious colleague could have come up with a terse answer: There's a visitor for that! Now, we can be a bit more helpful and actually write that visitor. Our end goal is to come up with a function

fold_typ: (Cil_types.typ -> 'a -> 'a) -> 'a -> 'a

so that fold_typ f init will be f t_1 (f t_2 (... (f t_n init)...)), where the t_i are all the C types appearing in a given C program (in no particular order).

Frama-C's visitors

The visitor pattern is a well-known object-oriented design pattern that can be used to perform a given action over all nodes of a complex data structure (in our case the Abstract Syntax Tree -AST for short- of a C program). Frama-C provides a generic visitor mechanism, built upon CIL's visitor, whose entry points can be found in the aptly named src/kernel/visitor.mli file. It is also documented in section 5.14 of the developer manual. In summary, you first define a class that inherits from the generic visitor and overrides the methods corresponding to the nodes you're interested in (in our case, this will be vtype for visiting uses of Cil_types.typ). Then you apply an object of this class to the function from the Visitor module that correspond to the subpart of the AST that you want to inspect (in our case, this will be the whole AST).

Basic version

The standard visitor does not provide anything to return a value outside of the AST. In fact, all the entry points in Visitor return a node of the same type that the node in which the visit starts (for visitors that don't perform code transformation, this is in fact physically the same node). But if you accept to sneak in a little bit of imperative code into your development -and by using Frama-C you've already accepted that- there is an easy way out: pass to the visitor a reference that it can update, and you just have to read the final value that reference is holding after the visit. The visitor then looks like the following:

class fold_typ_basic f acc =
object
  inherit Visitor.frama_c_inplace
  method vtype ty = acc:= f ty !acc; Cil.DoChildren
end

And that's it. Each time the visitor sees a type, it will apply f to ty and the result of the previous computations, stored in acc. fold_typ then just needs to call the visitor over the whole AST and give it a reference initialized with init:

let fold_typ f init =
  let racc = ref init in
  let vis = new fold_typ_basic f racc in
  Visitor.visitFramacFileSameGlobals vis (Ast.get());
  !racc

Don't do the same work twice

This first version, that is barely more than 10 LoC, works, but we can do a little better. Indeed, f will be called each time a type is encountered in the AST. In most cases, we want to call f once for any given type. This can be done quite simply by memoizing in an instance variable of our visitor the set of types encountered thus far. Frama-C's Cil_datatype module (cil/src/cil_datatype.mli) provides all the needed functions for that:

class fold_typ f acc =
object
  inherit Visitor.frama_c_inplace
  val mutable known_types = Cil_datatype.Typ.Set.empty
  method vtype ty =
    if Cil_datatype.Typ.Set.mem ty known_types then Cil.DoChildren
    else begin
      known_types <- Cil_datatype.Typ.Set.add ty known_types;
      acc:= f ty !acc;
      Cil.DoChildren
    end
end

Testing the infrastructure

It is now time to test if everything works smoothly. The following function will print the name and size of the type who has the biggest size in the analyzed program.

let test () =
  let f ty (maxty,maxsize as acc) =
    try
      let size = Cil.sizeOf_int ty in
      if size > maxsize then (ty,size) else acc
    with Cil.SizeOfError _ -> acc
  in
  let (ty,size) = fold_typ f (Cil.voidType,0) in
  Format.printf "Biggest type is %a,@ with size %d@." 
    !Ast_printer.d_type ty size

Since it is only a quick test, we don't do anything special if Cil complains that it cannot compute the size of a given type: we just stick to the maximal value computed so far.

File fold_typ.ml provides the code for the visitor and the test function. frama-c -load-script fold_typ.ml file.c should output something like

[kernel] preprocessing with "gcc -C -E -I.  file.c"
Biggest type is struct Ts,
with size 44

Exercises

  1. How can you use the fold_typ class to define an iter_typ function that apply a function f returning unit to each type of the AST (val iter_typ: (Cil_types.typ -> unit) -> unit)?
  2. Writing fold_typ is a bit overkill if you're going to apply it once in your plug-in. Write a specialized visitor that will do the same thing as the test function above.

Notes

[1] For those who are not familiar with functional programming: http://en.wikipedia.org/wiki/Fold_%...