Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - facetious-colleagues

Entries feed - Comments feed

Saturday, February 16 2013

From Facebook to Silent Circle, through the “metrics” Frama-C plug-in

From Facebook to Silent Circle

Some computers at Facebook were recently compromised because of a zero-day in Java. Nothing unexpected. Last december, instead of writing a full blog post, I lazily linked to Robert Graham predicting this sort of thing for the year 2013.


Speaking of Facebook, do you know who cares for your privacy? The good people at Silent Circle. They provide end-to-end encryption to use on your mobile phone, so that the only people to hear your discussions with your loved ones or business partners are you and them, and perhaps passengers in the same compartment as you, if you are this sort of sociopath.


Speaking of Robert Graham, he has written another excellent post, on the dangers of only focusing on the obviously important, the hard-to-get-right but known-to-be-hard-to-get-right cryptographic stuff, and neglecting the mundane, such as parsing. He takes the example of source code recently released by Silent Circle, finding traces of amateurism in the implementation of SDP. This is not were the hard cryptographic stuff takes place. But because this is a C++ implementation, any exploitable flaw at this level could have deep security implications. A buffer overflow could result in execution of code chosen by the attacker, and the attacker-chosen code could disable the cryptography entirely, or worse.

An attack on the protocol implementation is the equivalent of a $5 wrench in a well-known cryptography-related XKCD comic that probably does not require linking to at this point.

A mysterious commit message

[Metrics] Fix most complicated plugin ever, metrics (globals counted twice if declared and defined)

The above is the message attached by a facetious colleague to commit 21435 during the just elapsed week. After inquiry, the message is at least one-third sarcastic. Still, no-one ever said that the metrics plug-in should be simpler or easier to get right on first try than any of the other Frama-C plug-ins. And in fact, this was neither the first nor the last minor issue fixed in that plug-in last week.


Frama-C's metrics plug-in gathers various measurements of a syntactic nature. The “metrics” plug-in can help check how much of an unfamiliar codebase has actually been analyzed with a plug-in of a more semantic flavor, such as the value analysis. It is used, in context, in the next section of this post.

One reason “metrics” as hard to get right as any other Frama-C plug-in is that it deals with syntax. Semantic plug-ins such as the value analysis only need manipulate abstract objects such as functions and variables. In the metrics plug-in, details matter, such as the nuance between the declaration (in a .h header file) and the definition (in a .c file) of the same function (that was the bug being fixed in commit 21435), or the subtle difference between the source location of a variable and the source location of its initializer (that was another bug fixed this week).


Metaphorically, the metrics plug-in is to the Frama-C platform what the SDP parser is to an end-to-end encryption solution. It is not sexy, but it is just as hard to get right and as essential as the rest.

Using the metrics plug-in

Life being what it is, we sometimes find ourselves analyzing very unfamiliar code. I am talking code without documentation here, and often without either the hardware or a minimal description of the hardware that would be necessary to understand what behaviors can be exercised in the source code. This is one case when the metrics plug-in helps either anticipate how much work will be necessary, or how much has already been done.


Inspired by Robert Graham's post, I converted the Silent Circle SDP parser from C++ to C so as to see whether I could not find a bug in it using Frama-C's value analysis. At some point I had this modified parseSDP.c file and this modified sdp.h file. The originals can be seen here for parseSDP.c and sdp.h.

$ frama-c -cpp-command "gcc -C -E -I. -I `frama-c -print-share-path`/libc" parseSDP.c -metrics
[kernel] preprocessing with "gcc -C -E -I. -I /usr/local/share/frama-c/libc  parseSDP.c"
parseSDP.c:120:[kernel] warning: Calling undeclared function ipstr2long. Old style K&R code?
parseSDP.c:121:[kernel] warning: Calling undeclared function reverseIP. Old style K&R code?
parseSDP.c:328:[kernel] warning: Calling undeclared function u. Old style K&R code?
[metrics] Defined functions (6)
          =====================
           findMediaId (0 call); hasAttrib (0 call); findCrLf (4 calls);
           findip (3 calls); getMediaParams (1 call); parseSDP (0 call); 
          
          Undefined functions (10)
          ========================
           printf (1 call); strlen (1 call); strncmp (6 calls); atoi (2 calls);
           isdigit (4 calls); islower (1 call); isxdigit (1 call); ipstr2long (1 call);
           reverseIP (1 call); u (4 calls); 
          
          Potential entry points (3)
          ==========================
           findMediaId; hasAttrib; parseSDP; 
...

This snapshot arrives at a good time to use the metric plug-in, as above. The plug-in says that good entry points to start the analysis from are findMediaId(), hasAttrib() and parseSDP(). The heuristic here is that these functions are provided and aren't called, so they are likely to be part of the API.

The metrics plug-in also says that the only missing functions are those from printf() to isxdigit() (these are part of the standard library), u() (I introduced this function myself when converting from C++ to C), and ipstr2long() and reverseIP(), that are the only two that require digging up from elsewhere in the codebase. At that point I decided that the functions probably weren't important and that I could let Frama-C infer some sort of behavior for them based on their respective types.

The $5-wrench kind of flaw in the verification process would be to fail to notice that an important function is missing. Then the value analysis might report few or no issues, not because the code is safe, but because the analysis failed to visit most of it. In another scenario, the value analysis might fail to visit a large part of the code because there is an error early on that makes it look like the rest is unreachable. The “metrics” plug-in is the best safeguard we have against this kind of mistake, which is easy to make when verifying unfamiliar code.

A quick look at function parseSDP()

I finally used the commandline:

$ frama-c -cpp-command "gcc -C -E -I. -I `frama-c -print-share-path`/libc" parseSDP.c \
      -lib-entry -main parseSDP -context-valid-pointers -context-width 20 -val


Here are a few alarms found this way, cherry-picked as interesting because they are caused directly from the contents of the buffer pointed to by buf, that we assume to be under an attacker's control for this exercise.

Some alarms seem by contrast to be caused by the contents of *psdp. These are not as interesting to investigate, because that structure probably has some invariants it is supposed to satisfy, so the alarms we get there are only caused by the lack of context.

From the way it is handled inside parseSDP(), it seems that buf is only assumed to point to a zero-terminated string. It is a good way to pass a Sunday afternoon to see whether these alarms correspond to a real danger of buffer overflow for some input.

parseSDP.c:84:[kernel] warning: out of bounds read. assert \valid_read(buf+1);
parseSDP.c:82:[kernel] warning: out of bounds read. assert \valid_read(buf);
...
parseSDP.c:162:[kernel] warning: out of bounds read. assert \valid_read(buf);
parseSDP.c:165:[kernel] warning: out of bounds read. assert \valid_read(buf-2);
parseSDP.c:167:[kernel] warning: out of bounds read. assert \valid_read(buf+1);
...
parseSDP.c:262:[kernel] warning: out of bounds read. assert \valid_read(buf);
...
parseSDP.c:291:[kernel] warning: out of bounds read. assert \valid_read(buf);
parseSDP.c:294:[kernel] warning: out of bounds read. assert \valid_read(buf);
...
parseSDP.c:300:[kernel] warning: out of bounds read. assert \valid_read(buf);
...
parseSDP.c:303:[kernel] warning: out of bounds read. assert \valid_read(buf);
...
parseSDP.c:308:[kernel] warning: out of bounds read. assert \valid_read(buf);
...
parseSDP.c:308:[kernel] warning: out of bounds read. assert \valid_read(buf);
parseSDP.c:309:[kernel] warning: out of bounds read. assert \valid_read(buf);

Tuesday, October 2 2012

StackOverflow answers everything

One thing leading to another, I recently ended up wondering why, according to an individual close to the situation, the C snippet below is considered defined in C99.

  struct X { int a[5]; } f();
  int *p = f().a;
  printf("%p\n", p);

The programming Q&A website StackOverflow had never failed me before, so I asked there. Lo and behold, less than 24 hours later, a very reasonable explanation appeared.


I would like to thank my colleague Virgile Prevosto for answering my question on StackOverflow.

Wednesday, September 19 2012

Never forget to sanitize your input

This post is a follow up of this one

A facetious colleague pointed out to me that the print_stmt function that is used to display the CFG in the post mentioned above behaves incorrectly when used over code that include string constants, such as the one below:

void f(const char *);

void test(int c) {
  if ("A random string"[c]=='s') f("Hello, world"); else f("Goodbye");
}

Namely, poor dot is a little bit confused by all these double quotes occurring in labels that are themselves delimited by double quotes. It manages to output something, but the result is not really satisfying: cfg_string_cst.png

Thus, it looks as though the first improvement to our little script will be to have a more robust pretty-printing function for statements. Basically, we have to escape those pesky double quotes that might appear when pretty-printing a C expression. Fortunately, OCaml's Printf (hence Format) module does know how to do this, with the "%S" directive. There's one drawback, though: this directive will also put (unescaped) double quotes at the beginning and end of the result, that we'll have to get rid of. In the end, the escaping function is the following:

let protect pretty chan v =
  let s = Pretty_utils.sfprintf "%a" pretty v in
  let s = Pretty_utils.sfprintf "%S" s in
  Format.pp_print_string chan (String.sub s 1 (String.length s - 2))

First, we create a string containing the the unescaped output. Pretty_utils.sfprintf is a Frama-C function that is analogous to Format.sprintf except that 1) it can be fed with the same pretty-printing functions as fprintf when there is a custom "%a" directive and 2) each call uses its own internal buffer, avoiding subtle flushing issues. Second, we escape all the inner double quotes. Finally, we output the result, except for the first and last characters that by definition of "%S" are double quotes.

We can now use protect in print_stmt each time there might be an issue (the complete file is available here):

 let print_stmt out =
  function
-   | Instr i -> !Ast_printer.d_instr out i
+   | Instr i -> protect !Ast_printer.d_instr out i
...
-   | If (e,_,_,_) -> Format.fprintf out "if %a" !Ast_printer.d_exp e
-   | Switch(e,_,_,_) -> Format.fprintf out "switch %a" !Ast_printer.d_exp e
+   | If (e,_,_,_) -> Format.fprintf out "if %a" (protect !Ast_printer.d_exp) e
+   | Switch(e,_,_,_) ->
+       Format.fprintf out "switch %a" 
+         (protect !Ast_printer.d_exp) e

Using this new version of the script, frama-c -load-script cfg_print_escape.ml test.c && dot -Tpng -O cfg.out produces a better result: cfg_string_cst_correct.png

Thursday, September 6 2012

Crediting where credit is due

In a recent post, I showed how to use Frama-C's value analysis to prove a particular liveness property true or false of a particular C program.


My colleague Sébastien Bardin reliably informs me that the ideas for reducing a liveness property to a reachability property were all in the article Liveness Checking as Safety Checking. I had read the article, and the instrumentation I described was inspired by it, but I wasn't sure I understood the article well enough to claim that I was using exactly Biere, Artho and Schuppan's idea. It turns out I was, pretty much.

Thursday, August 23 2012

Technical interlude

The current series of posts is pretty technical, huh? Here is a refreshing diversion in the form of a technical post that also touches on existential questions. What is this blog for? We don't know.

Best solutions, or solutions anyone could have found

If you have been following the current series of posts, you may have noticed that a recurring dilemma is whether the goal is to demonstrate the capabilities of the software tool under consideration (Frama-C's value analysis plug-in) when pushed to its limits by its authors, or to demonstrate what anyone could do without being a creator of the tool, taking advantage of Linux, Mac OS X and sometimes Windows packages for the software, and, most importantly, having only the available documentation to understand what the software makes possible.


The general goal of the blog is to empower users. In fact, much of the aforementioned documentation is available as posts in the blog itself. These posts sometimes describe techniques that were undocumented until then, but describe them so that they wouldn't be undocumented any more. Some of the posts even describe techniques that weren't undocumented at the time of writing. As my colleague Jean-Jacques Lévy would say, “repetition is the essence of teaching”.


However, I do not expect that most software tool competitions are about measuring the availability and learnability of tools. For instance, the SV-COMP competition explicitly forbids anyone other than the authors of the tool to participate with it:

A person (participant) was qualified as competition contributor for a competition candidate if the person was a contributing designer/developer of the submitted competition candidate (witnessed by occurrence of the person’s name on the tool’s project web page, a tool paper, or in the revision logs).

Ostensibly, the tools are the participants in the competition. Involving the authors is only a mean to make the competition as fair as possible for the tools. Some colleagues and I have expressed the opinion that, for static analysis, at this time, this was the right way:

In general, the best way to make sure that a benchmark does not misrepresent the strengths and weaknesses of a tool is to include the toolmakers in the process.


Some competitions on the other hand acknowledge that a participant is a person(s)+tool hybrid, and welcome entries from users that did not create the tools they used. This is perfectly fine, especially for the very interactive tools involved in the cited competition, as long as it is understood that it is not a tool alone, but a hybrid participant, that is being evaluated in such a competition. Some of the participating tools participated in several teams, associated to several users (although the winning teams did tend to be composed of toolmakers using their respective tools).


It is not clear to me where on this axis the RERS competition falls. It is a “free-style” competition. The entries are largely evaluated on results, with much freedom with respect to the way these were obtained. This definitely can also be fun(*).

The competition certainly does not require participating tools to be packaged and documented in such exquisite detail that anyone could have computed the same results. As such, the best description to write is the description of what the authors of the value analysis can do with it with all the shortcuts they trust themselves to take. If some of the questions can be solved efficiently using undocumented features, so be it. It is the free-style style.

But the mission of the blog remains to document, and showing what we have done without explaining how to reproduce it sounds like boasting. I am torn.


(*) An untranslatable French idiom, “bon public”, literally “good audience”, used in an adjective position, conveys a number of subtle, often humorous nuances. I think this applies to me and software competitions. They just all seem fun to me.

A necessary built-in

A value analysis built-in whose usefulness, if it were available, became apparent in the last post was a built-in that halts the analyzer.

Several analysis log snippets in the previous post show a ^C as last line, to indicate that the log already contains the answer to the question being investigated, and that the user watching the analysis go by can freely interrupt it then.

I cheated when editing the post, by the way. I am not fast enough to interrupt the analysis just after the first conclusive log message. But I did interrupt these analyses, which otherwise might have gone on for a long time for no purpose.

I thought I would enjoy my colleagues' horrified reaction so I told everyone about my intention of writing a value analysis built-in, like some already exist for memcpy() or memset(), for halting the analyzer. Not halting the currently considered execution, mind you. There is /*@ assert \false; */ for that, and if you need this functionality available as a C function, you can put /*@ assert \false; */ inside your own function, like I did for assert() earlier when prepping the competition programs for Frama-C.


Clearly, such a built-in does not have a contract, its effects are meta-effects that transcend the framework (in what would, for these colleagues, be the bad sense of the word).

A time-saver

Today I set out to write this built-in, because I started to get serious about the competition and it would be convenient to just instrument the program with:

if ( /* current state identical to predecessor state /* )
{
  Frama_C_show_each_cycle_found(1);
  Frama_C_halt_analysis();
}

instead of having an operator, or even a shell script, watch the log for the Frama_C_show_each_cycle_found message and interrupt Frama-C then.


Soon after opening the file and finding my place, I realized that this feature was already available. I am telling you this because this blog is for the benefit of transparency, for documenting things and for empowering users, but mostly because I want to see my colleagues' horrified reaction when they read about this trick. Actually, I would not use it if I were you. It might cease to work in a future version.

if ( /* current state identical to predecessor state /* )
{
  Frama_C_show_each_cycle_found(1);
  Frama_C_cos(0.0, 0.0);
}

You see, Frama-C value analysis built-ins, such as Frama_C_cos(), do not have an intrinsic type. You can include a header that will give them a type if you wish:

double Frama_C_cos(double x);

Otherwise, the old C89 rules for missing function prototypes apply. During parsing, a type is inferred from the types of the arguments at the call site.

Only if and when the analysis actually reaches the Frama_C_cos(0.0, 0.0) call, the analyzer will realize that it is passed arguments that are incompatible with the OCaml function that implement the built-in. An exception will be raised. Since there is no reason to try to recover from a built-in misuse, this exception is caught only at the highest level. It stops the entire analyzer (it marks the entire analysis results as unreliable, as it should, but of course it does not retroactively unemit the Frama_C_show_each_cycle_found message in the log that says a conclusion has been reached).


The result looks like this, this time without cheating:

[value] Called Frama_C_show_each_cycle_found({1})
...
[value] computing for function any_int <- main.
        Called from Problem6_terminateU_pred.c:9516.
[value] Done for function any_int
[value] Semantic level unrolling superposing up to 1400 states
[value] user error: Invalid argument for Frama_C_cos function
[value] user error: Degeneration occured:
        results are not correct for lines of code
        that can be reached from the degeneration point.
[kernel] Plug-in value aborted because of invalid user input.

This is convenient for timing the analyzer as it tackles some of the harder problems. In the case above, the time was 5m35s answering a liveness question about Problem 6. It would have been a pain to watch the log for that long, fingers poised to interrupt the analyzer.


That's all for this interlude. Expect exciting timings in the next post. There may also be talk of undocumented features that transcend the problem, in the good sense of the word.

Friday, June 8 2012

To finish on termination

Enthusiastic reactions to the previous post about verifying the termination of a C program with Frama-C's value analysis lead me to write this post-scriptum.

On detecting both termination and non-termination

The scheme I outlined in the previous post can only confirm the termination of the program. If the program does not terminate, the analysis method I outlined runs indefinitely. I recommended to use a timeout and to interpret reaching the timeout as the answer “I don't know”. Off-line, a colleague complained about this behavior.


This colleague was, of course, being facetious. Frama-C's value analysis, when used normally (without option -obviously-terminates) is already a sound detector of non-termination. Consider the program below:

unsigned char u = 1;

int main(){
  while (u)
    {
      u = u + 14;
    }
  return u;
}

If you can get Frama-C's value analysis to print that the function is a “NON TERMINATING FUNCTION”, it means that the function is guaranteed not to have any defined terminating executions (to put it simply, the function is guaranteed not to terminate). In this case, it is easy:

$ frama-c -val t4.c
[value] Values at end of function main:
          NON TERMINATING FUNCTION

The values displayed as possible at the end of the function are sound over-approximations of the values that are possible in real (defined) executions. When the value analysis says that there are no possible values, you can trust that there are no real executions that reach the end of the function.


For more difficult cases, I recommend option -slevel to improve the chances that the analyzer will detect non-termination. The guarantees are the same as before: if the value analysis says “NON TERMINATING FUNCTION”, no real execution reaches the end of the function. If it describes possible values for variables at the end of the function, it mean that, because of over-approximations, it doesn't know whether the end of the function is reached at all.


unsigned char u;

int main(){
  while (u * u != 17)
    {
      u = u + 1;
    }
  return u;
}

The program above looks for a number whose square is 17. With the default settings, the value analysis is unable to decide whether the program terminates or not:

$ frama-c -val t5.c
...
[value] Values at end of function main:
          u ∈ [--..--]
          __retres ∈ [0..255]


Let us now analyze the program more precisely with option -slevel 300:

~ $ frama-c -slevel 300 -val t5.c
...
[value] Values at end of function main:
          NON TERMINATING FUNCTION

That's better: the above result means that the value analysis can guarantee that the program will keep searching for an unsigned char whose square is 17 until the end of the world.


To summarize:

  • If you want to verify non-termination, you can hope to do so with Frama-C's value analysis in normal mode.
  • If you want to verify termination, you can hope to do so with Frama-C's value analysis with option -obviously-terminate.
  • If you don't know whether the target program terminates or not and you want to chance to detect both, then you can launch both frama-c -val and frama-c -obviously-terminates -val in parallel and see whether one of them is able to conclude one way or the other.
  • If you want an analyzer that always tells in finite time whether a given program terminates or not, and although a C program without dynamic allocation is not exactly similar to a Turing machine, you may be asking too much.

On the Collatz conjecture

An expedient shortcut, when trying to explain undecidability to non-computer scientists, is to encode a famous, well-researched mathematical conjecture into a computer program, ideally one with a large monetary prize attached to it. It works well as an intuitive explanation of the difficulty of saying anything about the dynamic behavior of arbitrary programs. The argument is that if it was easy to tell whether an arbitrary program does something or the other, some mathematician would have taken the opportunity to become famous, and possibly rich, by encoding the conjecture as a computer program and analyzing this program. This is, of course, an informal meta-argument. Perhaps all mathematicians are stupid; we don't know about that. But I find this explanation of undecidability works better than any other I have tried.

A long time ago, I was using Fermat's last theorem for the above purpose. When it started to be considered satisfactorily proved, I switched to Goldbach's conjecture. But Goldbach's conjecture requires enumerating prime numbers. To encode it into the simplest possible program for the purpose of an informal explanation, one needs to write at least a function that recognizes a prime number, and explain that this is good enough to encode the conjecture into a program.


All this time, I should have been using the Collatz conjecture (that you may also know as “Syracuse problem”). My facetious colleagues reminded me of this in the comments to the previous post.


Let us humor these colleagues:

unsigned long long x;

unsigned long long input();

main(){
  x = input();
  while (x > 1)
    {
      if (x%2 == 0) x = x/2; else x = 3*x + 1;
    }
}

Trying to prove termination:

$ frama-c -obviously-terminates -val syracuse.c 
...
[value] Semantic level unrolling superposing up to 53600 states
[value] Semantic level unrolling superposing up to 53700 states
[value] Semantic level unrolling superposing up to 53800 states
^C[kernel] User Interruption (Ctrl-C)

Trying to prove non-termination:

$ frama-c -slevel 44444 -val syracuse.c
...
[value] Values at end of function main:
          x ∈ {0; 1}
          __retres ∈ {0}

Both attempts failed. Ah well, at least we tried.

Tuesday, May 22 2012

Iterating over the AST

Context

A facetious colleague who claims that he should be better writing his thesis but keeps committing Coq and OCaml files on Frama-C's repository, asked me the following question: Is there a function in Frama-C's kernel that can fold[1] a function over all types that appear in a C program?

As it turns out, nothing is readily available for this exact purpose, but another facetious colleague could have come up with a terse answer: There's a visitor for that! Now, we can be a bit more helpful and actually write that visitor. Our end goal is to come up with a function

fold_typ: (Cil_types.typ -> 'a -> 'a) -> 'a -> 'a

so that fold_typ f init will be f t_1 (f t_2 (... (f t_n init)...)), where the t_i are all the C types appearing in a given C program (in no particular order).

Frama-C's visitors

The visitor pattern is a well-known object-oriented design pattern that can be used to perform a given action over all nodes of a complex data structure (in our case the Abstract Syntax Tree -AST for short- of a C program). Frama-C provides a generic visitor mechanism, built upon CIL's visitor, whose entry points can be found in the aptly named src/kernel/visitor.mli file. It is also documented in section 5.14 of the developer manual. In summary, you first define a class that inherits from the generic visitor and overrides the methods corresponding to the nodes you're interested in (in our case, this will be vtype for visiting uses of Cil_types.typ). Then you apply an object of this class to the function from the Visitor module that correspond to the subpart of the AST that you want to inspect (in our case, this will be the whole AST).

Basic version

The standard visitor does not provide anything to return a value outside of the AST. In fact, all the entry points in Visitor return a node of the same type that the node in which the visit starts (for visitors that don't perform code transformation, this is in fact physically the same node). But if you accept to sneak in a little bit of imperative code into your development -and by using Frama-C you've already accepted that- there is an easy way out: pass to the visitor a reference that it can update, and you just have to read the final value that reference is holding after the visit. The visitor then looks like the following:

class fold_typ_basic f acc =
object
  inherit Visitor.frama_c_inplace
  method vtype ty = acc:= f ty !acc; Cil.DoChildren
end

And that's it. Each time the visitor sees a type, it will apply f to ty and the result of the previous computations, stored in acc. fold_typ then just needs to call the visitor over the whole AST and give it a reference initialized with init:

let fold_typ f init =
  let racc = ref init in
  let vis = new fold_typ_basic f racc in
  Visitor.visitFramacFileSameGlobals vis (Ast.get());
  !racc

Don't do the same work twice

This first version, that is barely more than 10 LoC, works, but we can do a little better. Indeed, f will be called each time a type is encountered in the AST. In most cases, we want to call f once for any given type. This can be done quite simply by memoizing in an instance variable of our visitor the set of types encountered thus far. Frama-C's Cil_datatype module (cil/src/cil_datatype.mli) provides all the needed functions for that:

class fold_typ f acc =
object
  inherit Visitor.frama_c_inplace
  val mutable known_types = Cil_datatype.Typ.Set.empty
  method vtype ty =
    if Cil_datatype.Typ.Set.mem ty known_types then Cil.DoChildren
    else begin
      known_types <- Cil_datatype.Typ.Set.add ty known_types;
      acc:= f ty !acc;
      Cil.DoChildren
    end
end

Testing the infrastructure

It is now time to test if everything works smoothly. The following function will print the name and size of the type who has the biggest size in the analyzed program.

let test () =
  let f ty (maxty,maxsize as acc) =
    try
      let size = Cil.sizeOf_int ty in
      if size > maxsize then (ty,size) else acc
    with Cil.SizeOfError _ -> acc
  in
  let (ty,size) = fold_typ f (Cil.voidType,0) in
  Format.printf "Biggest type is %a,@ with size %d@." 
    !Ast_printer.d_type ty size

Since it is only a quick test, we don't do anything special if Cil complains that it cannot compute the size of a given type: we just stick to the maximal value computed so far.

File fold_typ.ml provides the code for the visitor and the test function. frama-c -load-script fold_typ.ml file.c should output something like

[kernel] preprocessing with "gcc -C -E -I.  file.c"
Biggest type is struct Ts,
with size 44

Exercises

  1. How can you use the fold_typ class to define an iter_typ function that apply a function f returning unit to each type of the AST (val iter_typ: (Cil_types.typ -> unit) -> unit)?
  2. Writing fold_typ is a bit overkill if you're going to apply it once in your plug-in. Write a specialized visitor that will do the same thing as the test function above.

Notes

[1] For those who are not familiar with functional programming: http://en.wikipedia.org/wiki/Fold_%...

Wednesday, March 14 2012

But wait! There may be more!

My colleague Boris Yakobowski is on holidays, and while leisurely browsing the internet, he seems to have stumbled upon my previous post. He e-mailed me about it.

For some reason, it does not feel right to provide much biography when mentioning facetious colleagues, perhaps because it would look too self-congratulatory from the outside. Just this once, a little bit of context appears to be necessary: Boris is the person who has, using Frama-C's value analysis, managed to analyze the largest codebase to date. Without qualification, this does not mean much: anyone can type "frama-c -val *.c" in a directory and call it an analysis. But there is a point at which one can convincingly argue that the analysis is meaningful and that the results obtained are the best the tool can give. The latter, for instance, means all remaining alarms have been examined, and for each, the verifier has some confidence that it is false and an explanation why the analyzer emits the alarm instead of doing the precise thing and not emitting it. In short, Boris has reached this point for a pretty big piece of embedded software.


Boris, who knows everything about false alarms and generally Frama-C's value analysis' limitations, claims that the example from the last post is too simple and suspects me to have doctored it to make it analyze optimally in “only” 7 minutes. How unfair!


The example in last post comes straight from Omar Chebaro's PhD thesis. Omar worked during his PhD on SANTE (more information here), a Frama-C plug-in that combines results from the value analysis plug-in and concolic test-case generation with PathCrawler for, one, identifying false alarms when possible and two, providing input vectors for true alarms (again, when possible). There is no reason to doubt Omar's scientific integrity, but even if he had a slight bias, the bias would not necessarily be to make things easy for the value analysis. He might just as well be tempted to make the value analysis emit more alarms in order to show the necessity of sorting them out with dynamic analysis.


I do have a bias, and my motive in last post is clearly to show that the value analysis, on that example, with the right options, can limit its list of alarms to a single one, which happens to be a true alarm. But I did not choose the example, nor did I modify it to make it easier to analyze.


Note that SANTE is still useful in conditions where the value analysis emits a single alarm: at the end of last post, we only know that the alarm is true because of a comment in the source code telling us so. Using PathCrawler to classify that alarm gives SANTE a chance to confirm that it is true and to provide an input vector that the developer would certainly appreciate, if only for debugging purposes.

Friday, February 10 2012

Checking for overflows operation by operation

My colleague Bernard Botella pointed out an interesting example in an offline discussion following the last quiz.

The setup

Consider the snippet:

int s;
unsigned u1, u2;
...
s = u1 - u2;

The programmer's intention with the assignment is to compute, in variable s of type int, the mathematical difference between the value of u1 and the value of u2. You would expect that, using automatic plug-ins in Frama-C, ey would be able to check that the initial values for u1 and u2 are in ranges for which this is always what happens.

In fact, this looks like a perfect assignment for Rte and the value analysis. Rte can generate assertions for the various conversions and the value analysis can check whether the conditions are satisfied, following the method outlined in a recent post.

This is, after all, exactly what static analyzers — and coding rules that forbid overflows — are for.

An obvious false positive

The overflow in the subtraction u1 - u2 looks like it can be justified away. After all, such a warning is emitted when u1==3 and u2==5, and s will still receive the expected value -2 after the implementation-defined conversion from the value of expression u1 - u2, that is, UINT_MAX - 1, to int.

The programmer may think “okay, I can see why I get this warning, it happens as soon as u2 is larger than u1, but this is defined and, in my case, innocuous. I only want to check that the mathematical difference fits in s. The check about the absence of overflow in the conversion to int is what I am really interested in.”


Say that the programmer is using the Rte plug-in to express the conditions for both the subtraction and the implicit conversion to int to be safe, pretty-printing an annotated program as a result:

$ frama-c -rte-unsigned-ov -rte t.c -print
...
void main(void)
{
  /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */
  /*@ assert rte: u1-u2 ≥ 0; */
  s = (int)(u1 - u2);
  return;
}

Since the programmer has seen that the second assertion, about an overflow in the subtraction u1 - u2, is too restrictive, preventing use with u1==3 and u2==5, ey removes it. The programmer instead focuses on making sure that there is no overflow in the conversion to int of the difference, and feels confident if ey sees that there isn't any warning about that. It seems normal, right?

void main(void)
{
  /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */
  s = (int)(u1 - u2);
  return;
}

But…

Consider the case where u1 is very small, u2 is very large, and the value analysis has been able to infer this information relatively precisely. We assume 0 ≤ u1 ≤ 10 and 3000000000 ≤ u2 ≤ 4000000000 for this example:

/*@ requires 0 <= u1 <= 10 ;
    requires 3000000000 <= u2 <= 4000000000 ; */
void main(void)
{
  /*@ assert rte: (unsigned int)(u1-u2) ≤ 2147483647; */
  s = (int)(u1 - u2);
  return;
}

In this case, the value analysis does not warn about the conversion to int ...

$ frama-c -val -lib-entry u.c
...
u.c:8:[value] Assertion got status valid.

... but the value in s is not the mathematical difference between u1 and u2.

Conclusion

The conclusion is that looking at overflows atomically is not a silver bullet. Granted, if all possible overflows in the program are verified to be impossible, then machine integers in the program observationally behave like mathematical integers. But the more common case is that some overflows do in fact happen in the program and are justified as benign by the programmer. The issue pointed out by Bernard Botella is that it is possible to justify an overflow, for which the rule seems too strict, by exhibiting input values for which the expected result is computed, and then to be surprised by the consequences.

Perhaps in another post, we can look at ways around this issue.

Friday, January 20 2012

Constants quiz

What does the following program print?

long long l;
#include <stdio.h>

int main()
{
  l = -0x80000000;
  printf("%lld\n", l);
}

$ gcc t.c && ./a.out 

And this one? (beware, trick question)

long long l;
#include <stdio.h>

int main()
{
  l = -2147483648;
  printf("%lld\n", l);
}

$ gcc t.c && ./a.out 

What, "it's too difficult!"? Okay, here is a hint in the form of a third question:

long long l;
#include <stdio.h>

int main()
{
  l = -2147483648;
  printf("%lld\n", l);
}

$ gcc -std=c99 t.c && ./a.out 

You are probably reading this on a computer on which the answers are "2147483648", "2147483648" and "-2147483648".

- page 1 of 2