Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tuesday, March 18 2014

Nginx buffer overflow

A buffer overflow has been discovered in recent versions of the HTTP server Nginx.

Hacker News user jmnicolas pondered out loud: “I wonder if this discovery is a result of OpenBSD switching its focus from Apache to Nginx?”

It took me one minute to understand what ey meant. I was actually wondering why OpenBSD would be putting buffer overflows in Nginx, now that they are done putting buffer overflows in Apache. I mean, surely there is room for one more buffer overflow in Apache?


The analysis of what this means about this industry and/or me is left as an exercise to the reader.

Sunday, February 23 2014

An interesting SSL implementation bug: CVE-2013-5914

SSL in the news

SSL is a protocol for point-to-point confidential and authenticated communication over an insecure medium. It is the protocol behind HTTPS, among many other uses. In an Internet-connected system, the SSL implementation stands at the frontier between the system and the hostile outside world. For this reason, SSL implementation flaws are a prime target for attacks.

An ordinary bug

A rather banal SSL implementation bug was revealed over the weekend. A duplicated line in a crucial, if ordinary-looking, sequence of validation operations means that some of the validation steps are not taken:

    ...
    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail; // <-------------------------------------------------
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

Note that the second consecutive goto fail is executed with variable err containing the result of the last validation operation, 0, indicating success. This value is returned as-is by the function; the caller is therefore mislead into believing that the validation succeeded, despite some of the validation steps not having been executed.

Because C lacks an exception mechanism, the above is an often-seen programming pattern. The programming style here can hardly be blamed: this is how the best C programs are written. Except of course for the extraneous goto.

The SSL implementation, and thus the bug in question, are found in tens of millions of iOS devices, with a few additional Mac OS X computers on top of that. The scope of the security problem caused by this bug, and the obviousness of the issue when pointed out, have together lead to much commentary on Twitter and other popular discussion forums.

So reaction. Very noise

“I would never have had this problem because I know that goto is bad”, some commenters claim. I wish I was caricaturing, but I am unfortunately only paraphrasing and combining several public reactions into one.


“I would never have had this problem because I use braces”, essentially state others. Certainly, the root cause of the problem must have been that the developer who introduced the bug was confused about the meaning of if (c) stmt1; stmt2;. Just look how ey indented it!

These two often-heard remarks strongly suggest to use brace-less control flow or the presence of goto as predictors of defects in C code. I am sure this will be a fruitful research direction. Someone from the software engineering academic community should look into it.


“Just adding a single line of code can bring a system to its knees”, reminds Arie van Deursen. True, true, an important lesson there. We should all insert the following line somewhere in our respective codebases from time to time and take a good look at the effect it has, in remembrance.

(


It is Ariane 5 all over again! Worse, instead of just the makers of formal verification systems, everyone seems to have a scheme to prevent the problem they already know is there.

An interesting bug in a different SSL implementation

The problem in most of the armchair analysis that has been going on on the Internet lies in the two following questions: how many more bugs in security-critical C code does the proposed remedy (be it more braces, fewer gotos, …) find? How many time-costly, tedious, soul-crushingly boring false positives does it uncover?

Since commenters have been generalizing on the basis of a population of one sample, I feel no shame in presenting my own single example, raising the total number of scrutinized bugs to two. Note that, for us to make statistically sound arguments, we will eventually need to extend the discussion to examples of correct code that we do not wish to change.

Until then, here is a funny SSL implementation bug. It is characterized as follows, in a version of PolarSSL 1.1.8 that colleagues and I have been modifying.

Screenshot 1: an alarm in our tis_recv() function?

CVE-2013-5914-1.png

PolarSSL expects the program in which it is incorporated to provide it with a function that receives data from the outside world and writes it to a memory buffer. We have made such a function, baptized it tis_recv, and we have set up PolarSSL to use it.

The function tis_recv takes three arguments. The first one is a context argument in case someone needs one (our function ignores this argument). Second is a pointer to the buffer where the data is to be written, then third is the maximum size the function is allowed to write there.

We have specified our function tis_recv thus:

/*@ 
  requires recv_r1: \valid(output+(0..output_len-1)) ;
  requires recv_r2: output_len > 0 ; 
*/
int tis_recv(void* p, unsigned char * output, size_t output_len);

The screenshot indicates on the bottom right that the pre-condition recv_r1, which states that the argument output points to a buffer large enough for output_len characters, is not verified. How could this be? Surely a false positive… Or someone is calling the function with the wrong arguments. It does not look like the problem is in our function.

The GUI informs us that the function tis_recv is called in one place only, and that place is inside ssl_fetch_input(). It is called through a function pointer stored inside a member of a struct accessed through a pointer. The GUI tells use that we can mentally substitute ssl->f_recv(..) with tis_recv(...).

Screenshot 2: a wrong third argument

The GUI tells us that the buffer that PolarSSL passes to tis_recv() has size 16KiB-ish (not pictured), and that the variable len passed as third argument appears to take any of the values of its size_t type (pictured in the bottom panel below).

CVE-2013-5914-2.png

Screenshot 3: jumping back to the origin of the value

We inquire where the value of variable len was last set, and the GUI tells us it is at the yellow line just above the function call (pictured, in the middle panel). Well, hum, yes, we could have noticed that, but it was faster to click.

CVE-2013-5914-3.png

Screenshot 4: argument nb_want is too large

The value of len is computed from ssl_fetch_input()'s argument nb_want, and the value of nb_want appears to be too large, 65540, for the size of the buffer that the GUI tells us we have (in the screenshot below, the values computed as possible for nb_want are displayed in the bottom panel).

CVE-2013-5914-4.png

Screenshot 5: dangerous values come from caller ssl_read_record()

A new possibility offered by the Frama-C version I am using that may not even(*) be available in the latest release Fluorine is to observe, in the bottom panel, which call-sites and originating call-stacks cause which values to occur in a variable. Here, the GUI shows that nb_want appears to be 65540 when called from function ssl_read_record() at line 1178 in file ssl_tls.c of PolarSSL. This means we can continue the investigation there. In contrast, the value of nb_want can only be 5 when ssl_fetch_input() is called from ssl_parse_client_key_exchange(), so there is no need to look at that function: it is definitely not part of this problem.

(*) I don't remember. It has been a long time, has it not?

CVE-2013-5914-5.png

Screenshot 6: the seemingly too large argument is ssl->in_msglen

CVE-2013-5914-6.png

The problem is that ssl->in_msglen is too large here. But where does it come from?

Screenshot 7:

ssl->in_msglen has been computed from two bytes sent by the network (bad, bad network). But the value of ssl->in_msglen is supposed to have been validated before being used. This is what the lines between the obtention of the value and its use are supposed to do.

CVE-2013-5914-7.png

Screenshot 8:

CVE-2013-5914-8.png

The value of ssl->in_msglen is validated when ssl->minor_ver is 0, and it is validated when ssl->minor_ver is 1. But according to the GUI, ssl->minor_ver can be any of 0, 1 or 2.

Explanation

At this point it is only a matter of confirming that the call to ssl_read_record() can indeed be reached with ssl->minor_ver set to 2. This is where one switches to existential(*) mode, possibly crafting a malicious message, or simply building a convincing argument that values can converge here to cause bad things and send it to the official maintainer .

When I said that this was a modified PolarSSL 1.1.8 we were analyzing, I cheated a little. This is a 1.1.8 version in which I have re-introduced a bug that was there from 1.0.0 to 1.1.7. The principal maintainer of PolarSSL suggests to fix the bug by replacing == SSL_MINOR_VERSION_1 by >= SSL_MINOR_VERSION_1

(*) I like to think of this mode as “Bill Murray mode”, although you wouldn't see me say this out loud or put it in writing.

Conclusion

We have seen a second example of a real bug that can occur in an SSL implementation. Understandably in the current context, there has been much speculation over the last 48 hours on the innocence of the bug in Apple's implementation. Might it be a voluntarily inserted backdoor? Is the NSA behind this bug? (I link here to John Gruber's post because he writes clearly, but the same question is raised all over the Internet).

Allow me to put it this way: if the Apple SSL bug is a coup from the NSA, then you US citizens are lucky. Our spy agency in Europe is so much better that it does not even have a name you have heard before, and it is able to plant bugs where the buffer overflow leading to arbitrary code execution is three function calls away from the actual bug. The bug from our spy agency is so deniable that the code actually used to be fine when there were only two minor revisions of the SSL protocol. The backdoors from your spy agency are so lame that the Internet has opinions about them.


Real bugs come in all sizes and shapes. That one mistake with security implication also causes easily detectable unreachable code or wrongly-indented lines does not mean that all security flaws will be detected so easily, and that plenty of quirky but functionally correct code will not be wrongly flagged.

Speaking of varied bugs, “what about PolarSSL version 1.1.8 without the manually re-inserted bug from CVE-2013-5914?”, I hear you ask. This will be the subject of another blog post.

Acknowledgments

Philippe Herrmann was first to use Frama-C to find a security bug in PolarSSL (Philippe's bug was a denial of service in versions up to 1.1.1, fixed in 1.1.2). Anne Pacalet and Paul Bakker have helped me with various aspects of PolarSSL's verification, including but not limited to the bug described in this post. Twitter users aloria, thegrugq, and johnregehr provided comments on drafts of this post. And finally, the Internet made this post possible by being itself.

Tuesday, February 4 2014

Assertions

Jesse Ruderman on assertions and fuzzing

Jesse Ruderman has published a blog post on assertions and how they complement fuzzing. Key quote: “Fuzzers make things go wrong. Assertions make sure we find out.”

Readers of this blog are accustomed to me talking about differential testing, where a reference result (say, obtained by compiling a random C program with a quality compiler) is used to detect bugs in the target program (say, a static analysis framework for C programs). Differential testing imposes constraints: the random input must have a definite meaning, and a reference implementation needs to be available to compute this meaning. Often one finds bugs in the reference implementation together with bugs in the target program.

Besides, there are deeply burrowed bugs that are difficult to reveal with such a black-box approach. Assertions simplify the problem: if the code being tested contains enough well-chosen assertions, bugs can be caught without a reference implementation.

Sometimes, an assertion is a reference implementation: some of the assertions in Frama-C are alternative computations of the same intermediate result, followed by a comparison of the normally computed result with the alternately computed result. These assertions initially caught bugs in either computation. Since then, they have caught many more bugs in hash-consing, where the two results are structurally identical but are not shared.

Even when an assertion happens to contain a reference implementation for an intermediate result, it saves a lot of time to write the assertion as opposed to producing a complete reference implementation for the whole problem (say, interpreting C programs). The alternative implementation in the assertion does not have to be efficient: in Frama-C's value analysis, it is considered acceptable to spend 20% of the analysis time executing inefficient reference implementations in assertions.

John Regehr on writing and using assertions

John Regehr just published a blog post too on the subject of assertions. What a coincidence! Key quote: “we have tools that are specifically designed to help us reason about assertions; my favorite one for C code is Frama-C”.

In most of his post, John describes executable assertions written in the same language as the program being debugged. In the section quoted above, he moves to specific annotation languages to write assertions in. The advantages of using the same language as the programming language for assertions require no elaboration: it's the same language! The programmer already knows it, and the reviewer of the code already knows it.

But there is a spectrum of annotation languages for assertions. Eiffel stands its ground somewhere on this spectrum, very close to the underlying programming language but with enough good intentions to be noted. I think that the JML annotation language for Java was initially mostly intended for run-time checking, but ended up being very popular too as the annotation language used by static analyzers (I would be happy to be corrected if I am getting this wrong). Nearby JML lies E-ACSL, an executable subset of ACSL and also a Frama-C plug-in to convert a C program annotated with /*@ ... */ assertions into an instrumented C program that detects at run-time violated assertions. SPARK 2014 aims at making both camps happy.


I should point out for the sake of scientific integrity that I am being slightly cheeky in the choice of the key quote to represent John's post. I recommend you read the whole thing, of which the Frama-C endorsement is only a tiny fraction.

Taking advantage of C assertions with Frama-C

One can also use Frama-C in conjunction with existing executable assertions, typically implemented with the standard function assert() in header assert.h. The function assert() takes a C expression representing a property that should hold.

One way to take advantage of assertions is to fail to establish that they always hold, and warn the user that perhaps they don't. It is easy for anyone who wants this behavior to obtain it. One simply needs to specify assert() thus:

/*@ requires x != 0 ; */
void assert(int x);

With this specification, any existing call to the C function assert(), intended for the programmer to be executed at run-time, is required to have an argument that demonstrably corresponds to a non-null expression. This specification creates a link of sorts between static verification and run-time checking (or expressions intended to be checked at run-time, anyway).

Here is an example:

...
/*@ requires x >= 0 ;*/
double sqrt(double x);

int get_int(void);

double dist(double x, double y)
{
  double s = x * x + y * y;
  assert(s >= 0.0);
  return sqrt(s);
}

int main(int c, char **v){
  dist(get_int(), get_int());
}

When this program is analyzed, the value analysis detects that the assertion s >= 0.0 helpfully inserted by the programmer as an argument to assert() may not hold. In this particular example, the warning is a false positive, but never mind that for the moment.

$ frama-c -val t.c
...
t.c:4:[value] Function assert: precondition got status unknown.
...
t.c:1:[value] Function sqrt: precondition got status unknown.

Even more irritating in the example above is that after producing a false positive for assert(), the analyzer produces a false positive for the pre-condition of sqrt(). This brings us to another way a static checker could use C assertions to its advantage: it could take them as hints, properties that can be assumed to hold in order to avoid warning for problems that would seem to arise later if they did not.

With the user-specified function assert() above, the analyzer computes the truth value of the argument s >= 0.0. Because the set of values computed for s is approximated, the set of values for the expression s >= 0.0 is {0; 1}. The analyzer can tell that the pre-condition for assert() may not hold, but the relationship with the possible values of s is too indirect for it to decide that the forbidden situation corresponds to s negative and that only s positive is allowed.

There exists a better modelization of assert(). It was implemented as a value analysis built-in in order to offer the very functionality we are describing here:

$ frama-c -val -val-builtin assert:Frama_C_assert t.c
...
t.c:11:[value] warning: Frama_C_assert: unknown
...
t.c:1:[value] Function sqrt: precondition got status valid.
...
[value] Values at end of function dist:
  s ∈ [-0. .. 9.22337203685e+18]

On the same example, the builtin version of assert detects that it may be passed a null expression and warns about that. These is no improvement there (annotating the example to convince the analyzer that s is always positive is left as an exercise to the reader). Subsequently, and this is an improvement with respect to the previous analysis, the builtin does its best to incorporate the information provided in the assertion, so that it can tell that s is henceforth positive and that the pre-condition of sqrt() is satisfied.

Friday, January 24 2014

Bear-joke security is dead

Likely, you have heard this one before:

Two campers are surprised by an angry bear. One of them starts putting on eir running shoes. Surprised, the other exclaims “What are you doing, Alex? You can't outrun a bear!”

To which Alex replies: “I don't have to outrun the bear. I only have to outrun you.”

This joke used to be popular with security experts, who employed it as a metaphor. I only ever heard it in that context, the first time as a student in the late nineties. The bear joke was still used for this purpose in 2008.


You may also have heard that there is a new bear in town, and the new bear can take both Alex and eir friend, with or without running shoes. Also, the new bear could literally(*) eat a horse. And the bear is part of an organized network of hungry bears with walkie-talkies and sniper guns.

If your approach to security was based on bear jokes with dubious morals, now must really suck. Andy Green's blog post “Cryptography may not be dead, but it is on life support” is representative of the change. One of the quote he takes from Schneier's talk is:

Most of how the NSA deals with cryptography is by getting around it … They exploit bad implementations—we have lots of those.

Yes, we do, but we don't have to use them.


Here is to 2014 being the year of reliable cryptography implementations that cannot be circumvented through defects.


(*) “literally” within the confines of this metaphor

Friday, January 17 2014

Post-conditions and names of arguments

In an ACSL post-condition, any reference to the name of one of the function's arguments is assumed to refer to the initial value of the argument.

/* ensures arg == 1; */
void f(int arg)
{
  arg = 1;
}

For instance, in function f above, Frama-C's value analysis plug-in will typically say that the status of the post-condition is unknown, because arg is taken to mean \old(arg), the value passed to f and thus untouched by the assignment. The rationale is that arg has already ceased to live when the post-condition is evaluated: the program could not observe or otherwise depend on the value of arg after the call to f() anyway. On the other hand, it is convenient to be able to relate results and arguments of the function, and this can be done with the simple syntax “arg”. For a global variable G, one would have to write “\old(G)” to refer in f's post-condition to the value of G just before the call to f. The syntax “G” in a post-condition would refer to the value of G at the end of the function.


But do not worry, if you forget the above subtlety, you can always spend twenty minutes adding debug messages to the value analysis plug-in until you finally remember that said subtlety is what is actually implemented.

Saturday, December 21 2013

Improving the world for 2014, one programmer at a time

I move that all computing devices be equipped with a small explosive charge, set to explode when someone refers someone else to Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic without the referrer emself having read it.

It would not have to be lethal. I could be content if the apparatus maimed the user into harmlessness.


For what it is worth, the advice I wish I had received instead of being suggested Goldberg's essay the first time is:

  1. Print numbers in hexadecimal format as often as you can (%a in C99). Input them that way when it makes sense.
  2. Don't worry, floating-point numbers are just bits. Stop listening to anyone who tries to confuse you with generic reasoning about an arbitrary base β.


YMMV

Sunday, November 24 2013

C-Reduce and Frama-C

Automatic testcase reduction for compilers: the story so far

A previous post linked to a discussion of manual testcase reduction when the program being debugged is a compiler (and the input demonstrating the faulty behavior is thus a C program). Since then, quite a few related events took place:

In a comment to the linked post, someone mentioned automatic testcase reduction with “delta debugging”, and Bruce Dawson, the author of the original post, replied:

Given that the compiler sometimes changed what register it was using I’m not sure that the effort of writing an accurate test for this bug would be worthwhile. It really didn’t take me much time to reduce this bug — half an hour perhaps? Once you don’t need to link the code you can slash-and-burn at a fearsome rate, and my brain is more flexible at recognizing legitimate variations of the bug than any script would be.

And I further commented that on the other hand, a generic “accurate [automatic] test” only needs to be written once, and can serve multiple times. This post intends to show how right Bruce is: subtle pitfalls lurk when writing the acceptance test for automatically reducing a C program. Some of the dangers are theoretical enough when the program being debugged is a compiler, but become more annoying in practice when it is, say, a static analysis framework.

Meanwhile, John Regehr wrote a blog post of his own about the automatic reduction of testcases for compilers. The remarks in his post are quite orthogonal to the remarks in this one. My remarks are in the context of reducing testcases for Frama-C, which is a bit of a misappropriation, and reveals specific issues in the same way that using Csmith to find bugs in Frama-C provided some new insights.

The general idea explained on an example

Consider the example of a file from the Linux kernel that Frama-C's front-end chokes on. The original file causing the problem is 1.5MiB large after pre-processing and we would like automatic reduction to produce a smaller file that exhibits the same problem, so that we can show it to our Frama-C front-end expert.

The symptom of the problem is the error message below:

include/linux/delay.h:39:[kernel] failure: invalid implicit conversion from void to int

We can build a script that tests for the presence of this line in Frama-C's output. This script will then tell the reducer whether the latter is still on the right path or accidentally erased the interesting behavior.

A first obvious remark is that the test should not check for the presence of “include/linux/delay.h:39:[kernel] failure: invalid implicit conversion from void to int” exactly. If it does, then a reduced program can only pass the test by making it look as if the problem is at that file and line, which will prevent removal of #line file directives in the testcase, and also hinder the removal of irrelevant lines that only seem useful because they happen to make the error message come from the expected line.

Instead, the test should be only the presence of “failure: invalid implicit conversion from void to int” in the output of Frama-C. We think we will be happy with any testcase that produces this output, so we should configure the reducer with a test that detects this output and this output only. This idea will be a recurring theme in this post.

C-Reduce is the best reducer of C programs out there. C-Reduce takes as input a C program to reduce and a script implementing the interestingness test. A pre-condition is that the initial program has to be interesting. In exchange, C-Reduce produces the smallest interesting program that it can, removing and moving around bits of the original program and checking that the result remains interesting. When applied with the aforementioned 1.5MiB file from the Linux kernel and with the interestingness test frama-c t.i | grep "failure: invalid implicit conversion…", C-Reduce produces the minimal file below:

fn1 ()
{
    0 ? 0 * 0 ? : 0 : 0;
}

This superficially looks like something that we could show to our Frama-C front-end expert, although we will generate a better reduced program later in the post. If we make the mistake of grepping for "delay.h:39:[[]kernel[]] failure: invalid implicit conversion…" instead, C-Reduce still does a good job, preserving only a single additional line that happens to cause the warning to be localized where the interestingness test expects it:

#37 "include/linux/delay.h"
fn1 ()
{
    0 ? 0 * 0 ? : 0 : 0;
}

C-Reduce's only quirk may be that it sometimes reduces too much. In both reduced programs above, the function fn1() is implicitly typed as returning int according to now obsolescent C90 rules. In addition, the syntax e1 ? : e2 is not standard C but a GCC extension that happens to be accepted by the Frama-C front-end.

We have been lucky

We have in fact been lucky in the above reduction: the reduced program looks like something that should be accepted by Frama-C (despite the int ellipsis and the strange ? : syntax), and instead of accepting it, Frama-C emits an error message. This is exactly what we were hoping to obtain from the reduction process.

Consider the eventuality of the reduced program being an invalid program, incorrectly converting a void expression to int. Frama-C would be behaving correctly by emitting its warning on the reduced program. C-Reduce would have behaved according to specification, but we would be left without anything to show our front-end expert. We would have been had.

This is not a problem in C-Reduce, but a problem with us not using it properly. The reason the original 1.5MiB program is interesting is that it is accepted by GCC and rejected by Frama-C. This is the criterion the interestingness test should implement. When the interestingness test passed to C-Reduce implements an approximation of the notion of interestingness we really desire, we have only ourselves to blame if C-Reduce turns up an invalid reduced file that Frama-C correctly rejects.

Stop reducing so much, C-Reduce!

We do not want the Frama-C front-end expert we report the bug to to be sidetracked by considerations on obsolete syntax or strange extensions. The best testcase to show em is a short but perfectly standard C program. We can make this preference part of our interestingness test.

The first idea may be to reject any file that is not strictly C99-compliant. That would be a file that gcc -std=c99 -pedantic accepts to compile, I think. However, the unreduced testcase here is a file from the Linux kernel. The unreduced testcase does not pass this test.

Delta debugging is for preserving interestingness along size-reduction steps, not for making interestingness appear. Therefore, we cannot demand a reduced file that passes gcc -std=c99 -pedantic if the original file to start from does not pass it either.

Note: techniques borrowed from C-Reduce and from genetic programming might make interestingness appear out of uninteresting C files. Structural coverage of the program under test, beyond the ability to crash it, may make a useful, non-boolean fitness function. Or better, for a chosen assertion inside the program under test, coverage of the statements that lead to the assertion. This process would be different from delta debugging but might generate files that reveal bugs in static analysis frameworks. I hope someone will look into it someday.

However, simply compiling the tentative reduced file we already have with well-chosen compiler options reveals that GCC can already tell this file is not perfect. It emits the warnings snippets forbids omitting the middle term (regarding the syntax extension for ? :) and warning: return type defaults to (regarding implicit return types of functions). Our original file, although it originates from the Linux kernel, is not so ugly that it causes these warnings. Consequently, we can test for these warnings in an interestingness test that accepts the original file and will guide C-Reduce towards a clean reduced file.

If we do this, we obtain:

/*  */ void
fn1 ()
{
    0 ? 0 * 0 ? 0 : 0 : 0;
}

The Frama-C issue at hand was reported using the reduced C program as evidence, and was promptly fixed by Virgile Prevosto, Frama-C front-end expert extraordinaire.

C-Reduce is now parallel

The latest version of C-Reduce can explore several possible reductions in parallel. This is particularly interesting when the program under test takes a long time. A bug deep into the value analysis plug-in may take minutes to occur in the original program, and C-Reduce typically launches thousands of instances in the process of reducing a large program to a small one.

However, it does not pay to run too many instances of memory-bandwidth-limited programs in parallel. The computer I ran this example on has 4 cores and two execution threads by core (it relies on hyper-threading). The system thus sees 8 processors. By default, C-Reduce launches up to 8 instances in parallel. Is this the best choice?

If I use option C-Reduce's option -n to choose different levels of parallelism instead, I obtain, for this particular memory-intensive(*) program being debugged (Frama-C) and this particular initial C program, the following timings:

# instances    Time (s)
 1               620
 2               375
 3               346
 4               367
 5               396
 8               492

For this particular use, C-Reduce's default of using all available execution threads is faster than using only one execution thread. It is also slower than all other parallelization choices. The fastest reduction is obtained for 3 parallel instances, and it is almost twice faster than the non-parallel version.

There is some randomness to the reduction algorithm, however, and the experiment should be repeated with varying initial programs before conclusions are drawn.

(*) Frama-C can sometimes use as much as half the memory a modern web navigator typically uses. Imagine!

Making each instance terminate earlier

Even with an optimal choice of the degree of parallelism, automatic reduction can be time-consuming. A sub-optimal interestingness test for a bug that takes several minutes to appear in Frama-C can make the entire reduction take days, or exhaust the operator's patience altogether. This is only machine time we are talking about, but sometimes a human is waiting on the result of the reduction. In this section, we point out two useful tricks to bear in mind, especially when the program under test can take a long time for some inputs. The fact that the program under test takes a long time can be the problem being investigated, but does not have to.

First trick: using grep option -l

Often, as soon as a particular diagnostic message has been emitted, it is possible to conclude that the testcase is interesting. In the example above, as soon as the message “failure: invalid implicit conversion…” has been seen in Frama-C's output for a C program that GCC otherwise accepts, the input C program is known to be interesting.

By default, a shell script such as frama-c … test.c | grep "interesting message" allows Frama-C to finish its analysis before returning a boolean value. A simple improvement is to use grep's -l option, that causes grep to exit just after the matching line has been seen.

Contrast these two Unix sessions:

$ cat | grep interesting ; echo $?
nothing to see here
still nothing
this is interesting
this is interesting     ← this matching line is repeated by grep
one more line
two more lines
three more lines
cat is taking a long time
...
^D
0   ← error code of grep, printed by echo, indicating an interesting line was seen

In the above session, the grep command was still scanning its input after an interesting line had been seen, until I indicated that the “analysis” was finished with control-D. Below, I repeat the experiment, this time using grep -l to detect the keyword “interesting”.

$ cat | grep -l interesting ; echo $?
nothing to see here
still nothing
this is interesting
(standard input)     ← grep indicates a matching line has been seen on stdin
one more line
0   ← error code of grep, printed by echo, indicating an interesting line was seen

Thanks to the -l option, the cat command was terminated abruptly before it had finished, when emitting the line immediately after the interesting line (cat was terminated for trying to write to a closed file descriptor). Actually, this is a slight chink in the armor: the program whose output is piped into grep needs to write one more line after the interesting line in order to be terminated. If the interesting behavior is Frama-C taking a long time for some inputs, and the interesting log line is one that says “warning: things are going to take an infinite time from now on”, and nothing is ever printed after that line, Frama-C will not be terminated. My Unix-fu is too weak for me to offer a solution where the program under test is terminated as soon as possible, but the next subsection offers a palliative anyway.

Second trick: using a timeout

The initial input program may fail (i.e. “be interesting”) within, say, 5 minutes. This does not prevent slight variations of the program to take forever to fail, or to take forever to succeed. In each case, we do not want the reduction process to waste days of computations testing a single variation. We want the reduction process instead to try as many variations as possible quickly in order to converge to a short one as soon as possible. The strength of delta debugging is in iteration.

A simple idea here is to define the interestingness property as “the C program causes the interesting message to be emitted by Frama-C within 10 minutes”. Any program that has already been analyzed for 11 minutes is uninteresting by this definition. This property can be implemented simply with the Unix command ulimit.

Strictly speaking, the interestingness property should be completely reproducible, whereas this one is not. However, C-Reduce still works very well in practice with such a limit in place in the interestingness test. To limit the risk of breaking internal C-Reduce assertions, the time the initial program takes to be interesting should be multiplied by a safety factor. Because C-Reduce launches several instances in parallel, each instance runs a bit slower than if it was alone, and execution times can vary a bit. I have never had any trouble using as the timeout twice the time the initial program takes to fail.

Automating the manual

Let us take a step back and look at what we have been doing above. We started with one obvious interesting property, and we progressively refined it in order to delineate the really interesting programs. We initially thought we were interested in C programs that cause Frama-C to emit the message “include/linux/delay.h:39:[kernel] failure: invalid implicit conversion from void to int”, but it eventually turned out that we really wanted Frama-C to emit “failure: invalid implicit conversion from void to int” within 10 minutes for a C program that GCC compiles without emitting warnings about ? : syntax extensions or implicit return types.

We have been obtaining the additional sub-properties in a trial-and-error fashion: we did not know that we wanted any of them until C-Reduce produced a program that did not have it. Still, I believe that:

  1. These additional sub-properties would always come from the same repertoire in actual C-Reduce practice. It is possible to build a large catalog of these desirable properties so that the end user does not need to re-discover each of them.
  2. Before the reduction per se even starts, the original program should be tested for all the above niceness properties. All the niceness properties the original program has should be preserved in the reduced program by making them part of the interestingness test. That is, if the original program does not warn about implicit return types, the reduced program should not warn about implicit return types. On the other hand, if the original program does warn about implicit return types, then the bug being investigated may be about implicit return types, so it is not possible to require the reduced program to avoid the warning.
  3. Still, it would be a useful feature if the reducer could detect that a niceness property has appeared during reduction, and strove to preserve it from there on. This was perhaps easy to implement in the early non-parallel version of C-Reduce: it looks like it could be done entirely by side-effects in the interestingness test. Since, C-Reduce has been parallelized, which greatly helps with reduction speed but means that implementing this feature becomes more complicated, if it is at all desirable.

Conclusion

This post collects a number of useful tricks for using C-Reduce, in particular when reducing bugs for Frama-C. C-Reduce is already immensely useful as it is, but this post also indicates some future work directions to make its use more automatic and more convenient yet.

Saturday, October 26 2013

Jakob Engblom on the Toyota Acceleration Case

We interrupt our regularly scheduled program to link to this post by Jakob Engblom. The post was prompted by the jury's conclusion, in a lawsuit over what appears to be an automotive software glitch, that the carmaker is liable.

Monday, October 21 2013

Bruce Dawson on compiler bugs

Bruce Dawson has written a superb blog post on a Visual C++ compiler bug (now fixed), covering every aspect an essay on compiler bugs should cover.

I really like one section, that I am going to quote in full:

Security

In these paranoid days of the NSA subverting every computer system available every bug is a possible security flaw. This bug seems curiously commonplace – why wasn’t it discovered when building Windows or other 64-bit products? The most likely explanation is chance and happenstance, but compiler bugs can be used to make innocuous source code do surprising things. Maybe this bug is used as part of a back door to let code execution go in a direction that the source code says is impossible. This happened accidentally on the Xbox 360 where the low 32 bits of r0 were checked and then all 64 bits were used. This inconsistency allowed a hypervisor privilege escalation that led to arbitrary code execution.

This bug is probably not security related, but security is another reason to ensure that compiler bugs get fixed.


Bruce's post also illustrates the “most compiler bugs aren't” problem and provides a step-by-step tutorial on manual bug reduction. Regarding the latter, my reaction to Bruce's post was to bring up automatic testcase reduction with “delta debugging”, but Gravatar user jrrr beat me to it.

Frama-C contributes to making automatic testcase reduction work in practice for C compiler bugs by providing a reference C implementation that detects the undefined behaviors that could make the (original or reduced) bug reports invalid. We also use delta debugging internally to simplify our own reports of bugs in Frama-C. The next post in this blog will illustrate this use of delta debugging on a concrete example.

Friday, October 18 2013

OCaml's option -compact can optimize for code size and speed

The OCaml compiler can generate native code for various architectures. One of the few code generation options is named -compact:

$ ocamlopt -help
Usage: ocamlopt <options> <files>
Options are:
  ...
  -compact  Optimize code size rather than speed
  ...

One of the effects, perhaps the only effect(?), of this option is on the code generated for allocation. In OCaml, a small block typically gets allocated from the minor heap. When all goes well, allocating in the minor heap means simply decrementing the pointer that separates the already-allocated top of the minor heap from the still-available bottom of the minor heap. Eventually the minor heap becomes full; what happens then is complicated. But the fast path is basically a subtraction and a conditional branch (usually not taken) to the complicated stuff.

With option -compact, an allocation simply calls a routine that does all the above. Without it, the subtraction and conditional jump are inlined at the site where the allocation is requested. OCaml being a typical functional language, there are plenty of these. Inlining a couple of instructions makes the code bigger, but when execution stays on the fast path, which is often, a routine call is avoided completely.

A small example

Consider the one-line OCaml function let f i = Some i. This function takes a value i and returns a Some memory cell that points to i. I intended i to be an integer, but I inadvertently wrote code that works for any type of i. This was possible because in OCaml, all types of values share a uniform representation, as discussed in the introduction of a previous post.

A Some cell occupies 16 bytes (yes, this is huge. Blame 64-bit computing). The fast code generated by ocamlopt -S t.ml, where L102 is the label at which selcom-called code manages the case when the minor heap is full:

	...
	subq	$16, %r15
	movq	_caml_young_limit@GOTPCREL(%rip), %rax
	cmpq	(%rax), %r15
	jb	L102
	...

The -compact option causes more compact code to be emitted for the allocation. When our function f is compiled with ocamlopt -S -compact t.ml, the sequence generated for the allocation is much shorter indeed. There is even a specialized allocation function for allocating 16-byte blocs, so no arguments need to be set up:

	...
	call	_caml_alloc1
	...

Effects of option -compact in the large

When the development version of Frama-C is compiled with option -compact, the binary takes 14.4MB.

Without option -compact, the Frama-C binary weights in at 15.4MB.


One additional megabyte of code is not going to hurt us at this point, right? It is not as if the Frama-C binary had to fit exactly in a box of ten floppy disks. As long as the larger code is faster, it is worth it…

$ time  bin/toplevel.lean_and_mean tests/test/adpcm.c -sparecode > /dev/null

user	0m0.843s

$ time  bin/toplevel.big_and_slow tests/test/adpcm.c -sparecode > /dev/null

user	0m0.856s

… but it turns out that the larger code isn't faster. In the case of Frama-C, and of the computer I am typing on, at least, the code generated with option -compact is smaller and faster.

Conclusion

It is not easy to try to predict performance on modern out-of-order architectures, but if I had to hazard an explanation, I would say that, besides the obvious cache concerns, having many scattered predictable conditional branches may be a losing proposition compared to one single conditional branch visited from many places with call and ret. Every conditional branch that isn't in the branch prediction buffer is at risk of being mispredicted. In addition, each entry taken by the branches that have recently been visited is an entry that isn't available for branches translated from conditionals in the source code.

In contrast, the execution of the call and ret instructions has been optimized over the years. The target of the ret instruction is predicted with a Return Stack Buffer that has a conceptually simpler job than the Branch Predictor (although it is clearly possible to write code for which the ret instructions cause a stall, which could re-establish an advantage for the partially inlined version).


The conclusion is that everyone who uses OCaml should try the native compiler's option -compact, and that perhaps the default should be for this option to be on.

- page 1 of 23