Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Wednesday, September 19 2012

Never forget to sanitize your input

This post is a follow up of this one

A facetious colleague pointed out to me that the print_stmt function that is used to display the CFG in the post mentioned above behaves incorrectly when used over code that include string constants, such as the one below:

void f(const char *);

void test(int c) {
  if ("A random string"[c]=='s') f("Hello, world"); else f("Goodbye");
}

Namely, poor dot is a little bit confused by all these double quotes occurring in labels that are themselves delimited by double quotes. It manages to output something, but the result is not really satisfying: cfg_string_cst.png

Thus, it looks as though the first improvement to our little script will be to have a more robust pretty-printing function for statements. Basically, we have to escape those pesky double quotes that might appear when pretty-printing a C expression. Fortunately, OCaml's Printf (hence Format) module does know how to do this, with the "%S" directive. There's one drawback, though: this directive will also put (unescaped) double quotes at the beginning and end of the result, that we'll have to get rid of. In the end, the escaping function is the following:

let protect pretty chan v =
  let s = Pretty_utils.sfprintf "%a" pretty v in
  let s = Pretty_utils.sfprintf "%S" s in
  Format.pp_print_string chan (String.sub s 1 (String.length s - 2))

First, we create a string containing the the unescaped output. Pretty_utils.sfprintf is a Frama-C function that is analogous to Format.sprintf except that 1) it can be fed with the same pretty-printing functions as fprintf when there is a custom "%a" directive and 2) each call uses its own internal buffer, avoiding subtle flushing issues. Second, we escape all the inner double quotes. Finally, we output the result, except for the first and last characters that by definition of "%S" are double quotes.

We can now use protect in print_stmt each time there might be an issue (the complete file is available here):

 let print_stmt out =
  function
-   | Instr i -> !Ast_printer.d_instr out i
+   | Instr i -> protect !Ast_printer.d_instr out i
...
-   | If (e,_,_,_) -> Format.fprintf out "if %a" !Ast_printer.d_exp e
-   | Switch(e,_,_,_) -> Format.fprintf out "switch %a" !Ast_printer.d_exp e
+   | If (e,_,_,_) -> Format.fprintf out "if %a" (protect !Ast_printer.d_exp) e
+   | Switch(e,_,_,_) ->
+       Format.fprintf out "switch %a" 
+         (protect !Ast_printer.d_exp) e

Using this new version of the script, frama-c -load-script cfg_print_escape.ml test.c && dot -Tpng -O cfg.out produces a better result: cfg_string_cst_correct.png

Tuesday, September 4 2012

Extracting information from Frama-C programmatically

Extending the framework

Some time ago, one of our fellow users asked us whether it was possible to extract the control-flow graph (CFG) of C functions from Frama-C. Fact is, although the CFG is computed during the elaboration of the AST, nothing exists currently to present the information to the user. But Frama-C is a Flexible framework, and a small (50 LoC) script will provide this functionality albeit in a rather crude form. More specifically, we'll ask Frama-C to output a file in the dot format that will contain a graph for each of the functions of the program under analysis. Basically, our script is composed of three parts: a function to pretty-print the statements as node of the graph, a visitor to create the graphs, and the boilerplate code that takes care of registering the code into Frama-C kernel.

Pretty-printing

Frama-C already has a function to pretty-print statements, but it is not suitable for us, as it will recursively print substatements of compound statements (blocks, if, while, ...), while we only want to have a label for the current statement: substatements will be represented by other nodes. We'll use thus the following small function:

let print_stmt out =
 function
   | Instr i -> !Ast_printer.d_instr out i
   | Return _ -> Format.pp_print_string out "<return>"
   | Goto _ -> Format.pp_print_string out "<goto>"
   | Break _ -> Format.pp_print_string out "<break>"
   | Continue _ -> Format.pp_print_string out "<continue>"
   | If (e,_,_,_) -> Format.fprintf out "if %a" !Ast_printer.d_exp e
   | Switch(e,_,_,_) -> Format.fprintf out "switch %a" !Ast_printer.d_exp e
   | Loop _ -> Format.fprintf out "<loop>"
   | Block _ -> Format.fprintf out "<enter block>"
   | UnspecifiedSequence _ -> Format.fprintf out "<enter unspecified sequence>"
   | TryFinally _ | TryExcept _ -> Format.fprintf out "<try>"

Creating the graphs

In order to create our output, we must make a pass through the whole AST. An easy way to do that is to use CIL's visitor mechanism. There are three kinds of nodes where we have something to do. First, at the file level, we create the whole graph structure:

method vfile f =
    Format.fprintf out "@[<hov 2>digraph cfg {@\n";
    ChangeDoChildrenPost (f,fun f -> Format.fprintf out "}@."; f)

Then, for each function, we encapsulate the CFG in a subgraph (and don't do anything for other globals, hence the SkipChildren for the other branch of the pattern-matching).

  method vglob_aux g =
    match g with
      | GFun(f,loc) ->
          Format.fprintf out "@[<hov 2>subgraph cluster_%a {@\n\
                             graph [label=\"%a\"];@\n" 
            Cil_datatype.Varinfo.pretty f.svar
            Cil_datatype.Varinfo.pretty f.svar;
          ChangeDoChildrenPost([g], fun g -> Format.fprintf out "}@\n@]"; g)
      | _ -> SkipChildren

Last, for each statement, we have to create a node of the graph, and create the edges toward its successors:

  method vstmt_aux s =
    Format.fprintf out "s%d@[[label=\"%a\"]@];@\n" s.sid print_stmt s.skind;
    List.iter 
      (fun succ -> Format.fprintf out "s%d -> s%d;@\n" s.sid succ.sid)
      s.succs;
    DoChildren

Registration within Frama-C's main loop

Now, we want Frama-C to be aware of our script. Basically, this amounts to write a driver function of type unit -> unit that will launch the visitor process, and to tell Frama-C that this function should be run as an analysis:

let run () =
  let chan = open_out "cfg.out" in
  let fmt = Format.formatter_of_out_channel chan in
  Visitor.visitFramacFileSameGlobals (new print_cfg fmt) (Ast.get())

let () = Db.Main.extend run

Launching the script

In order to obtain a cfg.out file with our graphs, we simply have to call frama-c that way:

frama-c -load-script cfg_print.ml [other_options] file.c [file2.c] 

Then, provided graphviz is installed, the command

dot -Tpdf -o cfg.pdf cfg.out

will provide a pdf file similar to this one

What next?

This script is very basic, and should be improved in several ways in order to be useful beyond the point of showing how to interact with Frama-C's API (some of these might be the subject of new posts in the future)

  • The graphs could be fancier, in particular by distinguishing between branching nodes and plain ones, or showing exit of blocks as well as their beginning
  • For now, the graphs are generated each time the script is explicitely loaded from the command line. One might eventually want to compile it as a proper plugin, systematically loaded by Frama-C, with the graph generation commanded by a proper command-line option
  • Some other parameters might be interesting, such as choosing the name of the output file, including only the CFG of certain functions, ...
  • The script could interact with some plugins to give more information (e.g. showing dead paths for a given run of Value)