Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - value

Entries feed - Comments feed

Friday, September 23 2016

A mini-tutorial of ACSL specifications for Value

(with the collaboration of F. Kirchner, V. Prevosto and B. Yakobowski)

Users of the Value plugin often need to use functions for which there is no available code, or whose code could be abstracted away. In such cases, ACSL specifications often come in handy. Our colleagues at Fraunhofer prepared the excellent ACSL by example report, but it is mostly directed at WP-style proofs.

In this post we explain how to specify in ACSL a simple function, in a way that is optimal for Value.

Prerequisites:

  • Basic knowledge of Value;
  • Basic knowledge of ACSL.

The messages and behaviors presented here are those of Frama-C Aluminium. Using another version of Frama-C might lead to different results.

A simple function

In this tutorial, we proceed in a gradual manner to isolate problems and make sure that each step works as intended. Let us consider the following informal specification:

// returns a random character between buf[0] and buf[n-1]
char get_random_char(char const *buf, unsigned n);

A minimal ACSL specification for this function must check two things:

  1. that n is strictly positive: otherwise, we'd have to select a character among an empty set, thereby killing any logician that would wander around at that time (buf is a byte array that may contain \0, so that there is no obvious way to return a default value in case n == 0);
  2. that we are allowed (legally, as per the C standard) to read characters between buf[0] and buf[n-1];

The following specification ensures both:

/*@
  requires n > 0;
  requires \valid_read(buf+(0 .. n-1));
*/
char get_random_char(char const *buf, unsigned n);

Note the spaces between the bounds in 0 .. n-1. A common issue, when beginning to write ACSL specifications, is to write ranges such as 0..n-1 without spaces around the dots. It works in most cases, but as 0..n is a floating-point preprocessing token (yes, this looks strange, but you can see it yourself in section 6.4.8 of C11 standard if you're not convinced) funny things might happen when pre-processing the annotation. For instance, if we had a macro MAX instead of n, (e.g. #define MAX 255), then writing [0..MAX] would result in the following error message: [kernel] user error: unbound logic variable MAX, since the pre-processor would dutifully consider 0..MAX as a single token, thus would not perform the expansion of MAX. For that reason, we recommend always writing ranges with spaces around the dots.

To check our specification, we devise a main function that simply calls get_random_char with the appropriate arguments:

void main() {
  char *buf = "abc";
  char c = get_random_char(buf, 3);
}

Then, if we run this with Value (frama-c -val), it will produce the expected result, but with a warning:

[kernel] warning: No code nor implicit assigns clause for function get_random_char, generating default assigns from the prototype

By printing the output of Frama-C's normalisation (frama-c -print), we can see what its generated assigns clause looks like:

assigns \result;
assigns \result \from *(buf+(0 ..)), n;

Despite the warning, Frama-C kernel generated a clause that is correct and not too imprecise in this case. But we should not rely on that and avoid this warning whenever possible, by writing our own assigns clauses:

/*@
  requires n > 0;
  requires \valid_read(buf+(0 .. n-1));
  assigns \result \from buf[0 .. n-1], n;
*/
char get_random_char(char const *buf, unsigned n);

Running Value again will not emit any warnings, plus it will indicate that both preconditions were validated:

[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i (no preprocessing)
[kernel] Parsing file.c (with preprocessing)
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] Values of globals at initialization

[value] computing for function get_random_char <- main.
        Called from file.c:10.
[value] using specification for function get_random_char
file.c:2:[value] function get_random_char: precondition got status valid.
file.c:3:[value] function get_random_char: precondition got status valid.
[value] Done for function get_random_char
[value] Recording results for main
[value] done for function main
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  buf ∈ {{ "abc" }}
  c ∈ [--..--]

However, the result is still imprecise: c ∈ [--..--], that is, it may contain any character, not only 'a', 'b' or 'c'. This is expected: assigns \from ... clauses are used by Value to compute dependencies information, but they are not sufficient to constrain the exact contents of the assigned variables. We need to use postconditions for that, that is, ensures clauses.

The following ensures clause states that the result value must be one of the characters in buf[0 .. n-1]:

ensures \subset(\result, buf[0 .. n-1]);

The ACSL \subset predicate is interpreted by Value, which isolates the characters in buf, and thus returns a precise result: c ∈ {97; 98; 99}. Note that Value prints the result as integers, not as ASCII characters.

This concludes the specification of a very simple partial function. But what if we want to take into account more possibilities, e.g. a NULL pointer in buf, or n == 0?

First try: a robust and concise specification

Our get_random_char function is simple, but not robust. It is vulnerable to accidents and attacks. Let us apply some Postel-style recommendations and be liberal in what we accept from others: instead of expecting that the user will not pass a NULL pointer or n == 0, we want to accept these cases, but return an error code if they happen.

We have two choices: add an extra argument to get_random_char that represents the result status (OK/error), or use the return value as the status itself, and return the character via a pointer. Here, we will adopt the latter approach:

typedef enum { OK, NULL_PTR, INVALID_LEN } status;

// if buf != NULL and n > 0, copies a random character from buf[0 .. n-1]
// into *out and returns OK.
// Otherwise, returns NULL_PTR if buf == NULL, or INVALID_LEN if n == 0.
status safe_get_random_char(char *out, char const *buf, unsigned n);

We could also have used errno here, but global variables are inelegant and, most importantly, it would defeat the purpose of this tutorial, wouldn't it?

This new function is more robust to user input, but it has a price: its specification will require more sophistication.

Before revealing the full specification, let us try a first, somewhat naive approach:

typedef enum { OK, NULL_PTR, INVALID_LEN } status;

/*@
  assigns \result, *out \from buf[0 .. n-1], n;
  ensures \subset(*out, buf[0 .. n-1]);
*/
status safe_get_random_char(char *out, char const *buf, unsigned n);

void main() {
  char *buf = "abc";
  char c;
  status res = safe_get_random_char(&c, buf, 3);
  //@ assert res == OK;
}

This specification has several errors, but we will try it anyway. We will reveal the errors as we progress.

Once again, we use a small C main function to check our specification, starting with the non-erroneous case. This reveals that Value does not return the expected result:

...
[value] Values at end of function main:
  buf ∈ {{ "abc" }}
  c ∈ [--..--] or UNINITIALIZED
  res ∈ {0}

Variable c is imprecise, despite our \ensures clause. This is due to the fact that Value will not reduce the contents of a memory location that may be uninitialized. Thus, when specifying the ensures clause of a pointed variable, it is first necessary to state that the value of that variable is properly initialized.

This behavior may evolve in future Frama-C releases. In particular, our specification could have resulted in a more precise result, such as c ∈ {97; 98; 99} or UNINITIALIZED.

Adding ensures \initialized(out); before the \ensures \subset(...) clause will allow c to be precisely reduced to {97; 98; 99}. This solves our immediate problem, but creates another: is the specification too strong? Could we have an implementation of get_random_char in which \initialized(out) does not hold?

The answer here is definitely yes, especially in the case where buf is NULL. It is arguably also the case when n == 0, although it depends on the implementation.

The most precise and correct result that we expect here is c ∈ {97; 98; 99} or UNINITIALIZED. To obtain it, we will use behaviors.

Another reason to consider using behaviors is the fact that our \assigns clause is too generic, leading to avoidable warnings by Value: because we have written that *out may be assigned, Value will try to evaluate if out is a valid pointer. If our main function includes a test such as res = safe_get_random_char(NULL, buf, 3), Value will output the following:

[value] warning: Completely invalid destination for assigns clause *out. Ignoring.

This warning is not needed here, but its presence is justified by the fact that, in many cases, it helps detect incorrect specifications, such as an extra *. Value will still accept our specification, but if we want a precise analysis, the best solution is to use behaviors to specify each case.

Second try: using behaviors

The behaviors of our function correspond to each of the three cases of enum status:

  1. NULL_PTR when either out or buf are NULL;
  2. INVALID_LEN when out and buf are not NULL but n == 0;
  3. OK otherwise.

ACSL behaviors can be named, improving readability and generating more precise error messages. We will define a set of complete and disjoint behaviors; this is not required in ACSL, but for simple functions it often matches more closely what the code would actually do, and is simpler to reason about.

One important remark is that disjoint and complete behaviors are not checked by Value. The value analysis does take them into account to improve its precision when possible, but if they are incorrectly specified, Value may not be able to warn the user about it.

Here is one way to specify three disjoint behaviors for this function, using mutually exclusive assumes clauses.

/*@
  behavior ok:
    assumes out != null && buf != \null;
    assumes n > 0;
    requires \valid_read(buf+(0 .. n-1));
    requires \valid(out);
    // TODO: assigns and ensures clauses
  behavior null_ptr:
    assumes out == \null || buf == \null;
    // TODO: assigns and ensures clauses
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    // TODO: assigns and ensures clauses
*/
status safe_get_random_char(char *out, char const *buf, unsigned n);

Our choice of behaviors is not unique: we could have defined, for instance, two behaviors for null pointers, e.g. null_out and null_buf; but the return value is the same for both, so this would not improve precision.

Note that assumes and requires play different roles in ACSL: assumes are used to determine which behavior(s) are active, while requires impose constraints on the pre-state of the function or current behavior. For instance, one should not specify assumes \valid(out) in one behavior and assumes !\valid(out) in another: what this specification would actually mean is that the corresponding C function should somehow be able to distinguish between locations where it can write and locations where it cannot, as if one could write if (valid_pointer(buf)) in the C code. In practical terms, the main consequence is that such a specification would prevent Value from detecting errors related to memory validity.

For a more concrete example of the difference between them, you may consult the small appendix at the end of this post.

Assigns clauses in behaviors

Our specification for safe_get_random_char is incomplete: it has no assigns or ensures clauses.

Writing assigns clauses in behaviors is not always trivial, so here is a small, simplified summary of the main rules concerning the specification of such clauses for an analysis with Value:

  1. Global (default) assigns must always be present (even for complete behaviors), and must be at least as general as the assigns clauses in each behavior;
  2. Behaviors only need assigns clauses when they are more specific than the global one.
  3. Having a complete behaviors clause allows the global behavior to be ignored during the evaluation of the post-state, which may lead to a more precise result; but the global assigns must still be present.

Note: behavior inclusion is not currently checked by Value. Therefore, if a behavior's assigns clause is not included in the default one, the result is undefined.

Also, the reason why assigns specifications are verbose and partially redundant is in part because not every Frama-C plugin is able to precisely handle behaviors. They use the global assigns in this case.

For ensures clauses, the situation is simpler: global and local ensures clauses are simply merged with an implicit logical AND between them.

The following specification is a complete example of the usage of behaviors, with precise ensures clauses for both outputs (\result and out). The main function below tests each use case, and running Value results in valid statuses for all preconditions and assertions. The \from terms in the assigns clauses are detailed further below.

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
    ensures \subset(*out, buf[0 .. n-1]);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

void main() {
  char *msg = "abc";
  int len_arr = 4;
  status res;
  char c;
  res = safe_get_random_char(&c, msg, len_arr);
  //@ assert res == OK;
  res = safe_get_random_char(&c, NULL, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(NULL, msg, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(&c, msg, 0);
  //@ assert res == INVALID_LEN;
}

Our specification includes several functional dependencies (\from), but there is still one missing. Can you guess which one? The answer, as well as more details about why and how to write functional dependencies, will appear in the next post. Stay tuned!

Appendix: example of the difference between requires and assumes clauses

This appendix presents a small concrete example of what happens when the user mistakingly uses requires clauses instead of assumes. It is directed towards beginners in ACSL.

Consider the (incorrect) example below, where bzero_char simply writes 0 to the byte pointed by the argument c:

/*@
  assigns *c \from c;
  behavior ok:
    assumes \valid(c);
    ensures *c == 0;
  behavior invalid:
    assumes !\valid(c);
    assigns \nothing;
  complete behaviors;
  disjoint behaviors;
*/
void bzero_char(char *c);

void main() {
  char *c = "abc";
  bzero_char(c);
}

In this example, Value will evaluate the validity of the pointer c, and conclude that, because it comes from a string literal, it may not be written to (therefore \valid(c) is false). Value then does nothing and returns, without any warnings or error messages.

However, had we written it the recommended way, the result would be more useful:

/*@
  assigns *c \from c;
  behavior ok:
    assumes c != \null;
    requires \valid(c);
    ensures *c == 0;
  behavior invalid:
    assumes c == \null;
    assigns \nothing;
  complete behaviors;
  disjoint behaviors;
*/
void bzero_char(char *c);

void main() {
  char *c = "abc";
  bzero_char(c);
}

In this case, Value will evaluate c as non-null, and will therefore activate behavior ok. This behavior has a requires clause, therefore Value will check that memory location c can be written to. Because this is not the case, Value will emit an alarm:

[value] warning: function bzero_char, behavior ok: precondition got status invalid.

Note that checking whether c is null or non-null is something that can be done in the C code, while checking whether a given pointer p is a valid memory location is not. As a rule of thumb, conditions in the code correspond to assumes clauses in behaviors, while requires clauses correspond to semantic properties, function prerequisites that cannot necessarily be tested by the implementation.

Tuesday, May 20 2014

Frama-C blog becomes self-aware, author unnecessary

A reader's challenge

A couple of days ago, faithful reader David Gil sent in a challenge:

The reference code for Keccak/SHA-3 has a correctness bug in the Optimized64 implementation. Can the value analysis plugin find it?

My patch fixing that bug was accepted; I believe that the trunk is correct as well. (The correctness problem was in the Optimized64 implementation.)

Of course Value Analysis produces (many) warnings on the Keccak code, but I was unable to find settings that provided sufficient precision to recognize this bug as especially serious. (If you would like a hint, correctness depends on the color of your bird.)

The many warnings Value Analysis produced spurred me to substantially rewrite the Keccak Team's implementation. My version, with some ASCL annotations is in a very preliminary state. I am sure there are many bugs... There are some attempts at verification in the directory verification/frama-c

A farewell

It is a great pleasure for me to discover that this blog has reached its independence. It has had more comments than posts for a little while, and now, with readers sending in their own posts, I might as well move to a start-up that, in collaboration with my previous employer CEA LIST, provides products and services based on Frama-C. I may even publish a blurb there from time to time when something newsworthy comes up.

My facetious colleague Virgile Prevosto will hopefully continue to provide insights on the more subtle aspects of ACSL's semantics here. I know I will read them.

Sunday, February 23 2014

An interesting SSL implementation bug: CVE-2013-5914

SSL in the news

SSL is a protocol for point-to-point confidential and authenticated communication over an insecure medium. It is the protocol behind HTTPS, among many other uses. In an Internet-connected system, the SSL implementation stands at the frontier between the system and the hostile outside world. For this reason, SSL implementation flaws are a prime target for attacks.

An ordinary bug

A rather banal SSL implementation bug was revealed over the weekend. A duplicated line in a crucial, if ordinary-looking, sequence of validation operations means that some of the validation steps are not taken:

    ...
    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail; // <-------------------------------------------------
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

Note that the second consecutive goto fail is executed with variable err containing the result of the last validation operation, 0, indicating success. This value is returned as-is by the function; the caller is therefore mislead into believing that the validation succeeded, despite some of the validation steps not having been executed.

Because C lacks an exception mechanism, the above is an often-seen programming pattern. The programming style here can hardly be blamed: this is how the best C programs are written. Except of course for the extraneous goto.

The SSL implementation, and thus the bug in question, are found in tens of millions of iOS devices, with a few additional Mac OS X computers on top of that. The scope of the security problem caused by this bug, and the obviousness of the issue when pointed out, have together lead to much commentary on Twitter and other popular discussion forums.

So reaction. Very noise

“I would never have had this problem because I know that goto is bad”, some commenters claim. I wish I was caricaturing, but I am unfortunately only paraphrasing and combining several public reactions into one.


“I would never have had this problem because I use braces”, essentially state others. Certainly, the root cause of the problem must have been that the developer who introduced the bug was confused about the meaning of if (c) stmt1; stmt2;. Just look how ey indented it!

These two often-heard remarks strongly suggest to use brace-less control flow or the presence of goto as predictors of defects in C code. I am sure this will be a fruitful research direction. Someone from the software engineering academic community should look into it.


“Just adding a single line of code can bring a system to its knees”, reminds Arie van Deursen. True, true, an important lesson there. We should all insert the following line somewhere in our respective codebases from time to time and take a good look at the effect it has, in remembrance.

(


It is Ariane 5 all over again! Worse, instead of just the makers of formal verification systems, everyone seems to have a scheme to prevent the problem they already know is there.

An interesting bug in a different SSL implementation

The problem in most of the armchair analysis that has been going on on the Internet lies in the two following questions: how many more bugs in security-critical C code does the proposed remedy (be it more braces, fewer gotos, …) find? How many time-costly, tedious, soul-crushingly boring false positives does it uncover?

Since commenters have been generalizing on the basis of a population of one sample, I feel no shame in presenting my own single example, raising the total number of scrutinized bugs to two. Note that, for us to make statistically sound arguments, we will eventually need to extend the discussion to examples of correct code that we do not wish to change.

Until then, here is a funny SSL implementation bug. It is characterized as follows, in a version of PolarSSL 1.1.8 that colleagues and I have been modifying.

Screenshot 1: an alarm in our tis_recv() function?

CVE-2013-5914-1.png

PolarSSL expects the program in which it is incorporated to provide it with a function that receives data from the outside world and writes it to a memory buffer. We have made such a function, baptized it tis_recv, and we have set up PolarSSL to use it.

The function tis_recv takes three arguments. The first one is a context argument in case someone needs one (our function ignores this argument). Second is a pointer to the buffer where the data is to be written, then third is the maximum size the function is allowed to write there.

We have specified our function tis_recv thus:

/*@ 
  requires recv_r1: \valid(output+(0..output_len-1)) ;
  requires recv_r2: output_len > 0 ; 
*/
int tis_recv(void* p, unsigned char * output, size_t output_len);

The screenshot indicates on the bottom right that the pre-condition recv_r1, which states that the argument output points to a buffer large enough for output_len characters, is not verified. How could this be? Surely a false positive… Or someone is calling the function with the wrong arguments. It does not look like the problem is in our function.

The GUI informs us that the function tis_recv is called in one place only, and that place is inside ssl_fetch_input(). It is called through a function pointer stored inside a member of a struct accessed through a pointer. The GUI tells us that we can mentally substitute ssl->f_recv(..) with tis_recv(...).

Screenshot 2: a wrong third argument

The GUI tells us that the buffer that PolarSSL passes to tis_recv() has size 16KiB-ish (not pictured), and that the variable len passed as third argument appears to take any of the values of its size_t type (pictured in the bottom panel below).

CVE-2013-5914-2.png

Screenshot 3: jumping back to the origin of the value

We inquire where the value of variable len was last set, and the GUI tells us it is at the yellow line just above the function call (pictured, in the middle panel). Well, hum, yes, we could have noticed that, but it was faster to click.

CVE-2013-5914-3.png

Screenshot 4: argument nb_want is too large

The value of len is computed from ssl_fetch_input()'s argument nb_want, and the value of nb_want appears to be too large, 65540, for the size of the buffer that the GUI tells us we have (in the screenshot below, the values computed as possible for nb_want are displayed in the bottom panel).

CVE-2013-5914-4.png

Screenshot 5: dangerous values come from caller ssl_read_record()

A new possibility offered by the Frama-C version I am using that may not even(*) be available in the latest release Fluorine is to observe, in the bottom panel, which call-sites and originating call-stacks cause which values to occur in a variable. Here, the GUI shows that nb_want appears to be 65540 when called from function ssl_read_record() at line 1178 in file ssl_tls.c of PolarSSL. This means we can continue the investigation there. In contrast, the value of nb_want can only be 5 when ssl_fetch_input() is called from ssl_parse_client_key_exchange(), so there is no need to look at that function: it is definitely not part of this problem.

(*) I don't remember. It has been a long time, has it not?

CVE-2013-5914-5.png

Screenshot 6: the seemingly too large argument is ssl->in_msglen

CVE-2013-5914-6.png

The problem is that ssl->in_msglen is too large here. But where does it come from?

Screenshot 7:

ssl->in_msglen has been computed from two bytes sent by the network (bad, bad network). But the value of ssl->in_msglen is supposed to have been validated before being used. This is what the lines between the obtention of the value and its use are supposed to do.

CVE-2013-5914-7.png

Screenshot 8:

CVE-2013-5914-8.png

The value of ssl->in_msglen is validated when ssl->minor_ver is 0, and it is validated when ssl->minor_ver is 1. But according to the GUI, ssl->minor_ver can be any of 0, 1 or 2.

Explanation

At this point it is only a matter of confirming that the call to ssl_read_record() can indeed be reached with ssl->minor_ver set to 2. This is where one switches to existential mode, possibly crafting a malicious message, or simply building a convincing argument that values can converge here to cause bad things and send it to the official maintainer .

When I said that this was a modified PolarSSL 1.1.8 we were analyzing, I cheated a little. This is a 1.1.8 version in which I have re-introduced a bug that was there from 1.0.0 to 1.1.7. The principal maintainer of PolarSSL suggests to fix the bug by replacing == SSL_MINOR_VERSION_1 by >= SSL_MINOR_VERSION_1.

Conclusion

We have seen a second example of a real bug that can occur in an SSL implementation. Understandably in the current context, there has been much speculation over the last 48 hours on the innocence of the bug in Apple's implementation. Might it be a voluntarily inserted backdoor? Is the NSA behind this bug? (I link here to John Gruber's post because he writes clearly, but the same question is raised all over the Internet).

Allow me to put it this way: if the Apple SSL bug is a trick from the NSA, then you US citizens are lucky. Our spy agency in Europe is so much better that it does not even have a name you have heard before, and it is able to plant bugs where the buffer overflow leading to arbitrary code execution is three function calls away from the actual bug. The bug from our spy agency is so deniable that the code actually used to be fine when there were only two minor revisions of the SSL protocol. The backdoors from your spy agency are so lame that the Internet has opinions about them.


Real bugs come in all sizes and shapes. That one mistake with security implication also causes easily detectable unreachable code or wrongly-indented lines does not mean that all security flaws will be detected so easily, and that plenty of quirky but functionally correct code will not be wrongly flagged.

Speaking of varied bugs, “what about PolarSSL version 1.1.8 without the manually re-inserted bug from CVE-2013-5914?”, I hear you ask. This will be the subject of another blog post.

Acknowledgments

Philippe Herrmann was first to use Frama-C to find a security bug in PolarSSL (Philippe's bug was a denial of service in versions up to 1.1.1, fixed in 1.1.2). Anne Pacalet and Paul Bakker have helped me with various aspects of PolarSSL's verification, including but not limited to the bug described in this post. Twitter users aloria, thegrugq, and johnregehr provided comments on drafts of this post. And finally, the Internet made this post possible by being itself.

Tuesday, February 4 2014

Assertions

Jesse Ruderman on assertions and fuzzing

Jesse Ruderman has published a blog post on assertions and how they complement fuzzing. Key quote: “Fuzzers make things go wrong. Assertions make sure we find out.”

Readers of this blog are accustomed to me talking about differential testing, where a reference result (say, obtained by compiling a random C program with a quality compiler) is used to detect bugs in the target program (say, a static analysis framework for C programs). Differential testing imposes constraints: the random input must have a definite meaning, and a reference implementation needs to be available to compute this meaning. Often one finds bugs in the reference implementation together with bugs in the target program.

Besides, there are deeply burrowed bugs that are difficult to reveal with such a black-box approach. Assertions simplify the problem: if the code being tested contains enough well-chosen assertions, bugs can be caught without a reference implementation.

Sometimes, an assertion is a reference implementation: some of the assertions in Frama-C are alternative computations of the same intermediate result, followed by a comparison of the normally computed result with the alternately computed result. These assertions initially caught bugs in either computation. Since then, they have caught many more bugs in hash-consing, where the two results are structurally identical but are not shared.

Even when an assertion happens to contain a reference implementation for an intermediate result, it saves a lot of time to write the assertion as opposed to producing a complete reference implementation for the whole problem (say, interpreting C programs). The alternative implementation in the assertion does not have to be efficient: in Frama-C's value analysis, it is considered acceptable to spend 20% of the analysis time executing inefficient reference implementations in assertions.

John Regehr on writing and using assertions

John Regehr just published a blog post too on the subject of assertions. What a coincidence! Key quote: “we have tools that are specifically designed to help us reason about assertions; my favorite one for C code is Frama-C”.

In most of his post, John describes executable assertions written in the same language as the program being debugged. In the section quoted above, he moves to specific annotation languages to write assertions in. The advantages of using the same language as the programming language for assertions require no elaboration: it's the same language! The programmer already knows it, and the reviewer of the code already knows it.

But there is a spectrum of annotation languages for assertions. Eiffel stands its ground somewhere on this spectrum, very close to the underlying programming language but with enough good intentions to be noted. I think that the JML annotation language for Java was initially mostly intended for run-time checking, but ended up being very popular too as the annotation language used by static analyzers (I would be happy to be corrected if I am getting this wrong). Nearby JML lies E-ACSL, an executable subset of ACSL and also a Frama-C plug-in to convert a C program annotated with /*@ ... */ assertions into an instrumented C program that detects at run-time violated assertions. SPARK 2014 aims at making both camps happy.


I should point out for the sake of scientific integrity that I am being slightly cheeky in the choice of the key quote to represent John's post. I recommend you read the whole thing, of which the Frama-C endorsement is only a tiny fraction.

Taking advantage of C assertions with Frama-C

One can also use Frama-C in conjunction with existing executable assertions, typically implemented with the standard function assert() in header assert.h. The function assert() takes a C expression representing a property that should hold.

One way to take advantage of assertions is to fail to establish that they always hold, and warn the user that perhaps they don't. It is easy for anyone who wants this behavior to obtain it. One simply needs to specify assert() thus:

/*@ requires x != 0 ; */
void assert(int x);

With this specification, any existing call to the C function assert(), intended for the programmer to be executed at run-time, is required to have an argument that demonstrably corresponds to a non-null expression. This specification creates a link of sorts between static verification and run-time checking (or expressions intended to be checked at run-time, anyway).

Here is an example:

...
/*@ requires x >= 0 ;*/
double sqrt(double x);

int get_int(void);

double dist(double x, double y)
{
  double s = x * x + y * y;
  assert(s >= 0.0);
  return sqrt(s);
}

int main(int c, char **v){
  dist(get_int(), get_int());
}

When this program is analyzed, the value analysis detects that the assertion s >= 0.0 helpfully inserted by the programmer as an argument to assert() may not hold. In this particular example, the warning is a false positive, but never mind that for the moment.

$ frama-c -val t.c
...
t.c:4:[value] Function assert: precondition got status unknown.
...
t.c:1:[value] Function sqrt: precondition got status unknown.

Even more irritating in the example above is that after producing a false positive for assert(), the analyzer produces a false positive for the pre-condition of sqrt(). This brings us to another way a static checker could use C assertions to its advantage: it could take them as hints, properties that can be assumed to hold in order to avoid warning for problems that would seem to arise later if they did not.

With the user-specified function assert() above, the analyzer computes the truth value of the argument s >= 0.0. Because the set of values computed for s is approximated, the set of values for the expression s >= 0.0 is {0; 1}. The analyzer can tell that the pre-condition for assert() may not hold, but the relationship with the possible values of s is too indirect for it to decide that the forbidden situation corresponds to s negative and that only s positive is allowed.

There exists a better modelization of assert(). It was implemented as a value analysis built-in in order to offer the very functionality we are describing here:

$ frama-c -val -val-builtin assert:Frama_C_assert t.c
...
t.c:11:[value] warning: Frama_C_assert: unknown
...
t.c:1:[value] Function sqrt: precondition got status valid.
...
[value] Values at end of function dist:
  s ∈ [-0. .. 9.22337203685e+18]

On the same example, the builtin version of assert detects that it may be passed a null expression and warns about that. These is no improvement there (annotating the example to convince the analyzer that s is always positive is left as an exercise to the reader). Subsequently, and this is an improvement with respect to the previous analysis, the builtin does its best to incorporate the information provided in the assertion, so that it can tell that s is henceforth positive and that the pre-condition of sqrt() is satisfied.

Friday, January 17 2014

Post-conditions and names of arguments

In an ACSL post-condition, any reference to the name of one of the function's arguments is assumed to refer to the initial value of the argument.

/* ensures arg == 1; */
void f(int arg)
{
  arg = 1;
}

For instance, in function f above, Frama-C's value analysis plug-in will typically say that the status of the post-condition is unknown, because arg is taken to mean \old(arg), the value passed to f and thus untouched by the assignment. The rationale is that arg has already ceased to live when the post-condition is evaluated: the program could not observe or otherwise depend on the value of arg after the call to f() anyway. On the other hand, it is convenient to be able to relate results and arguments of the function, and this can be done with the simple syntax “arg”. For a global variable G, one would have to write “\old(G)” to refer in f's post-condition to the value of G just before the call to f. The syntax “G” in a post-condition would refer to the value of G at the end of the function.


But do not worry, if you forget the above subtlety, you can always spend twenty minutes adding debug messages to the value analysis plug-in until you finally remember that said subtlety is what is actually implemented.

Wednesday, July 31 2013

From Pascal strings to Python tuples

Quiz time

What does the program below do?

#include <stdio.h>

int main(){
  struct {
    int t[4];
    int u;
  } v;
  v.u = 3;
  v.t[4] = 4;
  printf("v.u=%d", v.u);
  return 0;
}

Two answers are “it prints v.u=4” and “it prints v.u=3”:

$ gcc t.c && ./a.out 
v.u=4
$ gcc -O2 t.c && ./a.out 
v.u=3

The correct answer is of course that the program invokes undefined behavior. It is not that we are using at any time an lvalue of the wrong type to access memory, breaking the so-called “strict aliasing rules”. It is not that v.t+4 is outside of object v. The problem is that v.t+4 is outside object v.t. So GCC does what it pleases, and when compiling with -O2, optimizes brutally:

$ gcc -S -O2 t.c && cat t.s
.LC0:
	.string	"v.u=%d\n"
…
	movl 	$3, %edx
	movl 	$.LC0, %esi
	movl	 $1, %edi
	xorl	%eax, %eax
	call	__printf_chk

Frama-C's value analysis warns for the above program:

$ frama-c -val t.c
t.c:9:[kernel] warning: accessing out of bounds index {4}. assert 4 < 4;

In general, accessing t[i] when t is an array of size 4 is only valid when i < 4, but here the index is hard-coded as 4, so line 9 is only valid when 4 < 4. That is, never: all executions that reach line 9 encounter undefined behavior there.

Second quiz, same as the first quiz

What does the program below do?

#include "stdlib.h"

typedef struct{
  int tab[1];
} ts;

int main() {
  ts *q = malloc(5*sizeof(int));
  q->tab[2]= 5;
  return 1;
}

If you guessed “invoke undefined behavior”, well done!


The program above was shown to me by facetious colleague Bernard Botella, who is hard at work analyzing Python 2.7.4's runtime in the context of a project named SafePython. The snippet above is his reduced version of a larger piece of C code he found there. The issue Bernard was having started with the type definition below, and I will let you guess the rest:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[1];

   /* ob_item contains space for 'ob_size' elements.
    * Items must normally not be NULL, except during construction when
    * the tuple is not yet visible outside the function that builds it.
    */
} PyTupleObject;

In C90, the “array of size 1 as last member of a struct” was a common idiom for implementing things like Pascal strings. And of course it is just as valid for variable-length tuples. The problem is that this is not 1990 any more: compilers now use undefined behavior as an excuse to optimize aggressively, and the idiom is no longer valid at all for either tuples or Pascal strings. On the plus side, in the C99 standard we got “incomplete types”, a safe way to implement tuples and Pascal strings:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[];
…

Conclusion

I have encouraged my colleague Bernard to report the above as a bug in Python. This kind of bug report is usually ignored, because it denounces idioms that programmers have used for a long time and that they think they understand. Just remember: if you think you can predict what the program in the second quiz does, you should be able to predict what the program in the first quiz does (or explain what is different about it).

Thursday, July 11 2013

Arithmetic overflows in Fluorine

There is a C quiz in the middle of this post, lost in the middle of all the reminiscing.

A history of arithmetic overflows in Frama-C

From the very beginnings in 2005, until after the first Open Source release in 2008, Frama-C's value analysis was assuming that all arithmetic overflows produced two's complement results. This seemed the right thing to do at the time.

Then an option was introduced to emit alarms on undefined overflows. John Regehr suggested it after testing one of the Open Source versions. The option was turned off by default. If a value analysis user turned the option on, any undefined arithmetic overflow in the program would be detected and reported as a possible error, with the same gravity as dereferencing NULL or accessing an array out of its bounds.

Later, a helpful reminder was added to the value analysis' output: in the default behavior of not emitting alarms, an informative message was emitted instead—if such an overflow was detected—about two's complement being assumed.

There was one last change in the last release, Fluorine. Actually, two changes: the name of the option for emitting alarms on undefined overflows changed, and the default value changed. The setting is now to emit alarms by default, and can be changed to not emitting them, for instance if the target code is destined to be compiled with gcc -fno-strict-overflow -fwrapv, in which case all overflows happening during execution can be expected to produce two's complement results.


One aspect remains unchanged in the above evolution: the discussion only applies to undefined overflows.

The philosophy was always to analyze programs as they were written, and not to force any change of habit on software developers. The initial choice not to warn about overflows was because we knew there were so many of these—most of them intentional—that we would be deluging the user with what would feel like a flood of false positives.

The gradual shift towards more arithmetic overflow awareness is a consequence of the fact that in C, some arithmetic overflows are undefined behavior. Compilers display increasing sophistication when optimizing the defined behaviors to the detriment of the predictability of undefined ones. To make a long story short, the “overflows produce 2's complement results” heuristic was wrong for some programs as compiled by some optimizing compilers.

In keeping with the same philosophy, “overflows” that are defined according to the C99 standard have always been treated by the value analysis plug-in with the semantics mandated by the standard. Those overflows that the standard says must have “implementation-defined” results are treated with the semantics that the overwhelming majority of compilation platforms give them (and it remains possible to model other architectures as the need arises).

A quiz

Other static analyzers may also warn for arithmetic overflows, but the philosophy can be different. The philosophy may for instance be that any overflow, regardless of whether it is defined or implementation-defined according to the letter of the standard, might be unintentional and should be brought to the attention of the developer.

In the few examples below, the goal is to predict whether Frama-C's value analysis with its new default setting in Fluorine would emit an alarm for an overflow. For extra credit, a secondary goal is to predict whether another static analyzer that warns for all overflows would warn. We assume a 32-bit int and a 16-bit short, same as (almost) everybody has.

int a = 50000;
int b = 50000;
int r = a * b;
unsigned int a = 50000;
unsigned int b = 50000;
unsigned int r = a * b;
int a = 50000;
int b = 50000;
unsigned int r = a * b;
short a = 30000;
short b = 30000;
short r = a * b;
unsigned short a = 50000;
unsigned short b = 50000;
unsigned int r = a * b;

Answers

int a = 50000;
int b = 50000;
int r = a * b;

There is an overflow in this snippet (mathematically, 50000 * 50000 is 2500000000, which does not fit in an int). This overflow is undefined, so the value analysis warns about it.


unsigned int a = 50000;
unsigned int b = 50000;
unsigned int r = a * b;

The multiplication is an unsigned int multiplication, and when the mathematical result of unsigned operations is out of range, the C99 standard mandates that overflows wrap around. Technically, the C99 standard says “A computation involving unsigned operands can never overflow, …” (6.2.5:9) but we are using the word “overflow” with the same meaning as everyone outside the C standardization committee including Wikipedia editors.

To sum up, in the C99 standard, overflows in signed arithmetic are undefined and there are no overflows in unsigned arithmetic (meaning that unsigned overflows wrap around).


int a = 50000;
int b = 50000;
unsigned int r = a * b;

The multiplication is again a signed multiplication. It does not matter that the result is destined to an unsigned int variable because in C, types are inferred bottom-up. So the value analysis warns about an undefined overflow in the multiplication here.


short a = 30000;
short b = 30000;
short r = a * b;

There is no overflow here in the multiplication because the last line behaves as short r = (short) ((int) a * (int) b);. The justification for this behavior can be found in clause 6.3.1 of the C99 standard about conversions and promotions (the general idea is that arithmetic never operates on types smaller than int or unsigned int. Smaller types are implicitly promoted before arithmetic takes place). The product 900000000 does fit in the type int of the multiplication. But then there is a conversion when the int result is assigned to the short variable r. This conversion is implementation-defined, so the value analysis does not warn about it, but another static analyzer may choose to warn about this conversion.


unsigned short a = 50000;
unsigned short b = 50000;
unsigned int r = a * b;

Perhaps contrary to expectations, there is an undefined overflow in the multiplication a * b in this example. Right in the middle of the aforementioned 6.3.1 clause in the C99 standard, on the subject of the promotion of operands with smaller types than int, the following sentence can be found:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.

All values of a 16-bit unsigned short fit a 32-bit int, so the third line is interpreted as unsigned int r = (unsigned int) ((int) a * (int) b);.


Incidentally, things would be completely different in this last example if int and short were the same size, say if int was 16-bit. In this case the third line would be equivalent to unsigned int r = (unsigned int) a * (unsigned int) b; and would only contain an unsigned, innocuous overflow.

Wrapping up

In Fluorine, the option to activate or deactivate the emission of these undefined arithmetic overflow alarms is called -warn-signed-overflow (the opposite version for no alarms being -no-warn-signed-overflow). I felt that providing this piece of information earlier would have rendered the quiz too easy.

Although Frama-C's value analysis adheres to the semantics of C and only warns for undefined overflows, it is possible to use Frama-C to check for the other kinds of overflows by using another plug-in, Rte, to automatically annotate the target C program with ACSL assertions that express the conditions for overflows. Note that that post pre-dates the Fluorine release and is written in terms of the old options.

Wednesday, October 10 2012

RERS 2012 competition: our solutions for problems 1-9

Previously on this blog

Although it was so brief that you may have missed it, I previously mentioned here the 2012 RERS Grey Box Challenge, an interesting competition where the questions involve programs in C syntax.


I pointed out that some questions were about the reachability of assertions in the programs. As it is documented, Frama-C's value analysis can guarantee that some assertions are not reachable. When its authors are using it, they can ascertain that no approximation is taking place, and that for this reason, any assertion that the analysis diagnoses as perhaps reachable is indeed reachable for some sequence of inputs. Because of the way it work, the analysis is unable to exhibit the sequence of inputs, but one such sequence definitely exist.

We then moved on to properties expressed as LTL formulas. Some of the LTL formulas in the competition expressed reachability properties. It was easy to diagnose those by applying Frama-C's value analysis to an instrumented program. The other properties were more difficult liveness properties. With more instrumentation, involving either a transition counter or a non-deterministic predecessor state, it was possible to diagnose these too using the value analysis.

Colleagues and I have obtained solutions for all properties in the initial problems 1-9 of the competition. More information about these solutions is available in the last part of this post. First, now that the exercise is done, I would like to revisit its benefits.

Features gained, bugs fixed

New built-in Frama_C_interval_split()

A key part of the main() function used for the verification of any kind of property in the competition was:

  input = unknown_int();
  /*@ assert input == 1 || input == 2 || input == 3 ||
                input == 4 || input == 5 || input == 6 ; */

These three lines are intended to replace the following one in the original main() function provided by the organizers. Their line works for interactive sessions where the operator inputs number in the range 1..6.

        scanf("%d", &input);        


The call to unknown_int() could also have been Frama_C_interval(1, 6). Then the value analysis would be able to tell that the assertion is not an assumption, only a hint. Regardless, the assertion is there, together with option -slevel, to make the analyzer study separately what happens in executions where a different value is read for input.


This is a classic trick used in many earlier studies. It is mentioned in the study of the Skein-256 hash function and in nearly all others since. Still, these assertions do get tiring to write when many cases need to be listed. Until now, there was no shortcut (I have been using Emacs macros myself).

Enter Frama_C_interval_split(l, u). It does the same thing that Frama_C_interval(l, u) does, but it automatically causes the individual values between l and u inclusive to be propagated separately, without need for a laborious ACSL assertion. The Frama_C_interval_split() built-in is available in Oxygen (I think), and since it is a true value analysis built-in, the user does not need to remember to include file builtin.c to make it accessible. A simple prototype int Frama_C_interval_split(int, int); will do.

The informational message “unrolling up to ... states” could be off

If you have used option -slevel (and as previously said, if you use Frama-C's value analysis at all, you should definitely use this option from time to time), you know that it sometimes logs a message to tell how much superposition has been done on a same program statement. It looks like:

...
[value] Semantic level unrolling superposing up to 100 states
...
[value] Semantic level unrolling superposing up to 200 states
...

The algorithm that takes care of this message was like this:

  target := 100;
  if new number of states attached to the statement ≥ target
  then
    print "Semantic level unrolling superposing up to " target " states"
    target := target + 100
  endif

The intention was that the value displayed would indicate the maximum number of superposed states to the nearest hundred.


Suppose the first wave of propagated states to reach a statement contains a thousand of them. The algorithm displays “superposing up to 100 states”, and updates the target to 200. If the second wave contains 2500 more states, the algorithm then displays “superposing up to 200 states” and updates the target to 300. And so on. If states keep arriving in large numbers, variable target gets irremediably behind.


This had not been noticed until this competition, but in model-checking mode, manipulating tons of states, target never gets a chance to catch up and the number displayed can be much lower than the number of states actually superposed.


This bug is present in Frama_C Oxygen. It is fixed in the development version.

An unnamed option was tested more thoroughly

Last but proverbially not least, thanks to this competition, the value analysis option that caches previous analyses of a function call in order to re-use them without loss of precision has received more testing. Who knows, someday we may even trust it enough to reveal its name.

Conclusion

The next section describes our solutions. If you did not at least attempt the competition yourself, it is unlikely that you will find it interesting: you should stop reading now. Here is my personal conclusion before you go: case studies are fun, and model-checking is a fun sub-field of computer science. At the same time, a piece of software is not an automaton: if you try to specify one as if it was the other you will certainly make mistakes and verify something other than you intended. I think that our experience trying to participate in the competition demonstrates that. If you are not convinced, please try to answer the competition's questions for yourself, and then, and only then, read the next section.

Our solutions to the 2012 RERS challenge

Description

The reachability of assertions is straightforward to answer with the value analysis, so I won't describe that. Here is an archive containing our answers for the LTL properties part. The archive contains mostly everything. Many files there were generated from others. In particular, the *.log files are analysis log files that took about one core-month to generate on a 2.66 GHz Intel Xeon X5650 workstation. The last file to be generated was “results”. It contains the predicted statuses of properties as interpreted with the conventions that follow.


A specific LTL property for a specific problem is checked by putting together the problem's transition function (function calculate_output() and its callees), a file buchimgt.c specific to the problem (this file contains the definition of the predecessor state and functions that manipulate it), a file main.c that contains a generic main loop, and a file such as output Z occurs after output U until output W.c that is specific to a property (but independent of the problem to which it is linked).

The problem-specific files are generated by the script genallproblemfiles.

The property-specific files such as output Z occurs after output U until output W.c were generated from templates such as output _ occurs after output _ until output _.c. The templates were hand-crafted by my colleague Sébastien Bardin to work together with the rest of the instrumentation.

The current status of the LTL property is maintained by an automaton designed for this kind of work (called a Büchi automaton). The generic main() function describes our interpretation of the problem. In pseudo-code:

forever
   read input 1..6
   compute output from input and current state
   compute transition in Büchi automaton from input and output
   if the current trace can no longer fail to satisfy the property
      prune 
   if the current trace can no longer satisfy the property
      display current state for verification
      abort analysis
   non-deterministically update predecessor state

The actual C file is designed so that the value analysis will remain complete as well as sound when analyzing it with high -slevel. A secondary objective is for the analysis to be as fast as possible despite current limitations. Readability only ranks third as an objective.

Still, regardless of the strange structure of the program, it is basically C. Any C programmer can read the main() and understand what is being verified. The only non-standard function is Frama_C_interval_split(). The informal description of Frama_C_interval_split() is that the function returns a value between the bounds passed to it and that the results provided by the analyzer hold for all values that it can return. If a simple Büchi automaton is plugged in that fails when output Z is emitted, it is relatively easy to understand on the basis of this main() function what we mean by “Z is emitted” and that Z can never be emitted if the analysis passes without the automaton raising a flag. This is how we can verify, as an example, the LTL property G !oZ.


As you can see on the pseudo-code, the Büchi automaton is updated with an input and the corresponding computed output, which are considered simultaneous. When calculate_output() fails to return an output because of a failed assertion, no pair (input, output) is presented to the automaton. To illustrate, consider the hypothetical property "output X occurs before input A". This property is weird, because each input A-F can happen at each instant, including the first one. It may seem that the property is tautologically false. With our interpretation, this property would be true if the system always failed the instant it read A until it had output X. These inputs would not count because the Büchi automaton would not see them.

The above is what we considered the most reasonable interpretation of what the competition organizers intended, at the time we sent our answers. On October 1, the organizers published example solutions that made us realize that they intended something else, and we had to retract our participation.

Differences with the official interpretation, and remaining unknowns

As the challenge period was coming to a close, organizers published solutions to additional problems 10 and 11 (the challenge was supposed to close at the end of September. The solutions illustrating the semantics were published on October 1 and the deadline extended).

One can only assume that these solutions also define the semantics of LTL formulas for problems 1-9, since no example is available for these. Unfortunately, there are subtle differences between the two families of programs.


Outputs corresponding to invalid inputs

The transition functions in problems 1-9 can return -1 or -2. They only return -2 when the rest of the program state was unchanged. The organizers communicated to us that traces in which -2 is emitted indefinitely should be ignored:

The same holds by the way for invalid inputs, i.e., calculate_output returning 2 [sic]. They are not meant to be part of the traces that LTL properties talk about - a liveness property such as F oW hence does not become false alone by infinitely often executing an invalid input.

I did not see this last point made explicit on the website.

For problems 1-9, since the program state is unchanged when -2 is returned, it seems that the easiest way to comply is to treat -2 as a fatal error that interrupts the trace. We do not lose much of importance in doing so because any trace with -2 outputs can be shortened into a trace without these outputs. We do lose some inputs during such a shortening, so it is still slightly invasive and we would like to be sure that this is what the organizers intended. Indeed, the main() function provided by the organizers does seem to treat -1 and -2 differently, and does call the inputs that lead to -2 “invalid”.

        if(output == -2)
                fprintf(stderr, "Invalid input: %d\n", input);
        else if(output != -1)
                        printf("%d\n", output);

The code above matches more or less the “if an unexpected input event is provided, an error message is printed and the ECA system terminates” statement on the competition's website. The code above does not stop on -2 but it seems clear that the “unexpected input” from the explanation is the -2 from the code and that, as informally stated, it interrupts the current trace.


You might expect programs 10-13 to follow the same pattern, but it turns out that instead of sometimes -1 and sometimes -2, they always return -2, both when the program state has been changed by the transition function and when it hasn't. You might still hope to interpret -2 as “error”, but in programs 10 and 11 it happens in the middle of traces that are given as example for the questions on the reachability of assertions. The solution page says:

error_46 reachable via input sequence
[E, A]

And, when you try it:

$ gcc Problem10.c 
$ ./a.out 
5
Invalid input: 5
1
Assertion failed: (0), function calculate_output, file Problem10.c, line 66.
Abort trap

So it does not look that -2 should be treated as an error after all. In the above organizer-provided trace, the error caused by input 5 (E) does not make error_46 unreachable (actually, I have it on good authority that error_46 is unreachable when, in the transition function, return -2; is treated as a trace-interrupting error).


In conclusion, for problems 1-9, we treated -1 as just another output letter and -2 as a trace-interrupting error. This does not correspond to what the organizers did in their example solutions to problem 10. We still do not know what the organizers intended. The transition function returning -2 instead of -1 in problems 10-13 seems particularly inscrutable when the organizers have previously told us by e-mail to ignore the traces with finitely many non-(-2) outputs.

Our aforelinked solutions were computed with this convention. We provide them in case someone wants to use the same convention and compare their results to ours.


Are inputs and outputs simultaneous or are they different events?

Again, we have no idea. In our modelization, the output happens at the same time as the input it was computed from.

But in example traces provided by the organizers, it looks like they are distinct events!! An example trace is [iD, oV, iB] ([oY, iC, oV, iB])*.

But if input and output are separate events, why is the first LTL formula for problem 10 expressed as (! iA WU (oU & ! iA)) ? The atom (oU & ! iA) makes no sense, because if the event is any sort of output event (U or another), then it cannot be an input event. The atom (oU & ! iA) is just equivalent to oU. All LTL properties that talk about inputs and outputs in the same instant are similarly nonsensical.

In our solutions, influenced by the shape of the competition's LTL formulas which talk about input and output in the same instant, we considered that input and corresponding output are simultaneous. Both are passed to the Büchi automaton at the time of computing a transition. An input can happen together with the output -1. If the transition function aborts because of an assertion or returns -2, then the input is not seen by the Büchi automaton (in the case of the -2 output, this would be easy to change by modifying the main() function. The trace could be interrupted just after the Büchi automaton transition instead of just before).


Is there any reachability LTL property at all in the competition?

The example solutions all provide infinite traces as counter-examples. This is consistent with the explanation “an LTL property holds for a certain ECA program if and only if all possible runs which do not lead to an error satisfy the LTL property”, but it is a very strange way to approach reachability questions (which indeed are no longer reachability questions at all). To answer a LTL question that would ordinarily have been a reachability question, such as “output W occurs at most twice”, in this setting, one must, when W has been encountered three times on a trace, wonder whether this trace is the prefix of at least one infinite trace. If there is a sequence of inputs that keeps the trace alive, then the trace is a counter-example. If all inputs that can be added to extend to this prefix eventually lead to an error, even if it is at a very far horizon, then the trace is not a counter-example (remember not to take into account the trace that indefinitely outputs -2 when trying to answer this sub-question).

In short, in this setting, no LTL formula ever expresses a reachability property. It does not prevent us to handle these LTL properties (Sébastien assures me that our method theoretically handles all liveness properties. We certainly handle those that are already liveness properties without the weird “all possible runs that do not lead to an error” quantification).


This discussion gives me an idea for a powerful static analysis tool. I will call it Marvin. You ask it “Is this software safe? Can this dangerous angle be reached?” and it answers “Who cares? The Sun is going to blow up in five billion years. Good riddance if you ask me. Oh god, I'm so depressed…”

Thanks

As the initiator of this attempt to participate to the competition, I would like to thank my colleagues Virgile Prevosto, Boris Yakobowski and Sébastien Bardin, from whom I personally learnt a lot about various aspects of software verification; my warmest thanks are for the organizers of the 2012 RERS Grey Box Challenge.

Friday, September 21 2012

A value analysis option to reuse previous function analyses

A context-sensitive analysis

Frama-C's value analysis is context-sensitive.

This means that when a function f2() is called from a caller f1(), function f2() is analyzed as many times as the analyzer goes over f1(). Function f2() is analyzed each time with a different program state—the program state corresponding to the specific call. If f2() is called from within a loop, the analyzer may analyze it many times.

This is for the worthy cause of precision. Any attempt to summarize f2(), so that it can be analyzed once and for all, can only result in less precise analysis results.

An example

Consider the following program:

int a1 = 1;
int a2 = 2;
int a3 = 3;

int p1 = -1;
int p2 = -1;
int p3 = -1;

void calculate_output(void)
{
  if (a1 == 1 && a2 == 2 && a3 == 3)
  {
    a1 = 2;
    return;
  }
  if (a1 == 2 && a2 == 2 && a3 == 3)
  {
    a1 = 1;
    return;
  }
}

main()
{
  calculate_output();
  p1 = a1;
  p2 = a2;
  p3 = a3;
  Frama_C_dump_each();
  calculate_output();
  p1 = a1;
  p2 = a2;
  p3 = a3;
  Frama_C_dump_each();
  calculate_output();
  p1 = a1;
  p2 = a2;
  p3 = a3;
  Frama_C_dump_each();
  calculate_output();
}

For the reason stated above, function calculate_output() is analyzed four times when this program is analyzed with the command frama-c -val:

$ frama-c -val calc.c
...
[value] computing for function calculate_output <- main.
        Called from calc.c:25.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:30.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:35.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:40.
...
[value] Values at end of function main:
          a1 ∈ {1}
          p1 ∈ {2}
          p2 ∈ {2}
          p3 ∈ {3}
...

An unnamed option

If you look at the program closely, you may notice that the third call and the first call (and respectively the fourth and second call) have a lot in common. For each of these pairs the analysis follows the same paths inside the function and change the program state in the same way. This is because although the states differ in variables p1, p2, p3, they are identical for the variables a1, a2, a3 that are the only variables read.

There is an undocumented (and indeed, unfinished) value analysis option to take advantage of previous analyses of a function call without loss of precision. It only reuses calls in circumstances similar to those that apply to the third and fourth call to calculate_output() above. When this option is enabled, Frama-C, after analyzing a function call, computes the sets of locations that were effectively read and written for that specific call and records these inputs and outputs, as well as the program state before and after the call, in a cache for the purpose of later re-using the analysis just done.

With that option, the log output during the analysis becomes:

[value] computing for function calculate_output <- main.
        Called from calc.c:25.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:30.
...
calc.c:35:[value] Reusing old results for call to calculate_output
...
calc.c:40:[value] Reusing old results for call to calculate_output
...
[value] Values at end of function main:
          a1 ∈ {1}
          p1 ∈ {2}
          p2 ∈ {2}
          p3 ∈ {3}

History and application to the RERS challenge

The option alluded to in this post was not implemented for the purpose of the RERS 2012 competition. It was implemented earlier in the context of a large case study where it contributed to speed up the analysis greatly. A lot of work would be involved in making this option work well in all contexts (and, actually, in making sure that derived analyses that plug into the value analysis are not confused by bits of analysis being reused).

Still, although it is unfinished, the mysterious option is finished enough to be useful in the context of the RERS competition. It is particularly useful when determining the status of liveness properties such as (F oW). Whether infinite execution traces are detected with a counter or with a quantum superposition of all previous states, the extra program state introduced by the instrumentation does not influence the behavior of function calculate_output(). In that context, it is a huge gain to be able to re-use the analysis of calculate_output(), which would otherwise, for each state, have to be duplicated for all possible predecessors.


Boris Yakobowski invented and implemented the option this post refers to, without which the hardest properties in the RERS challenge would be intractable.

Friday, August 24 2012

On writing a dedicated model-checker for the RERS competition

In recent posts, I have shown that Frama-C's value analysis could answer many reachability questions, and some questions that weren't originally phrased as reachability questions, about the programs in the RERS competition.

If you are vaguely familiar with the internals of Frama-C's value analysis, and if you tried analyzing some of the competition programs yourself, you may have noticed that these analyses use only a small fraction of the plug-in's capabilities. The analyzer is only ever propagating abstract states that correspond to singletons. It does juggle with many program states, but the programs here have small states that are encoded in just a few variables (the analyzer would have been able to manipulate the states encoded on a much larger number of variables and would efficiently share in memory the values common to several of the explored states). There are no bit-fields, no structs with padding (both of which might make identical states look different if carelessly handled). The programs obviously do not execute any undefined behavior for lack of any operation that might introduce them. There is a single outermost loop. There is no arithmetic at all.

In favor of the general verifier

A specialized verifier that was designed for just these programs would have a huge opportunity to do a better job on them. On the other hand, the advantage of working on a more general verifier is that it is useful for more tasks. This enables to spend more time improving it. Some of the improvements enhance the analysis of many programs, including the specific programs built only from assignments, equality tests and conditionals considered in the competition. Some of these improvements are too sophisticated to justify implementing in a verifier that only handles programs with assignments, equality tests and conditionals, because such a verifier will never be usable to find that the SHA-3 candidate Skein's reference implementation does not exhibit undefined behavior, that AES may be susceptible to timing attacks (but Skein isn't), where a couple of bugs are in an old undocumented piece of control software, that there is a subtle memory overflow in compression library QuickLZ, or that units of code have the data and control flows mandated by specification.

What a dedicated verifier might look like

In the particular case of these programs, the same propagation done by the value analysis could be done in C, by a program that would incorporate the transition function directly and execute it as compiled code. This would be much more efficient than what the value analysis does, and in this particular case, it would give the same results. From experiments interpreting Csmith programs, the value analysis slowdown with respect to compiled code can be expected to be of the order of 10000.

Accumulating reachable states

In order to reproduce what Frama-C's value analysis does, a dedicated verifier would need to store states that have already been visited, and to be able to recognize, after applying the transition function once more, whether the obtained state was one that was already visited.

In this particular case, this could be done in constant time with a hashset. Note, however, that it is only possible to compute a hash in this specialized case because all states are “singleton” states. If some states represented several values at once, e.g. a1 ∈ {1; 2} a2 ∈ {0; 1}, the good criterion would then be whether the newly computed state is included in one of the previously propagated states. Testing inclusion in one of the previous states cannot be done in constant time with a hashset (hashsets only allow you to test whether the new state is equal to an existing state, and you need to be careful to use compatible equality and hash functions).

Frama-C's value analysis uses a data structure that is generally as efficient as hashsets when the manipulated states are singletons, and that often remains efficient when testing inclusion in one another of states that are not singletons.

Storing states that remain to be propagated

A small thing: the propagation algorithm also requires a workqueue. Any propagation order will do (Frama-C's value analysis propagation order is not specified either), but since C comes with so few data structures, I just thought I would mention it. For a dedicated verifier in C, one would need to find some linked list implementation or write eir own. Frama-C's value analysis may interpret C slower than the code can be executed once compiled, but it comes with ready to use data structures for the reached set and for the workqueue. One needs not even know they are there to use the analyzer.

Specific drawbacks

Obviously, this ad-hoc verifier could be expected to be must faster than the value analysis. This makes the experiment tempting. What prevents me from starting work on it is the lack of generality of the resulting tool. Some examples follow.

Undefined behavior

Suppose that you had implemented such a specialized verifier for the competition's program, and that you were looking for more transition functions written in C that the same verifier could apply to. You would certainly find some, but would you in general be certain that the transition function never exhibits undefined behavior (not for the first transition, and not for the transition from any reachable state to a successor)? If one isn't certain that the transition function does not cause undefined behavior, from a formal verification point of view, the tool is worthless. An undefined behavior can cause anything to happen. In particular, an out-of-bounds memory access in the transition function can mess up the verifier's reached set or workqueue, and compromise the results of the verification.


Any automatic verifier is susceptible to the caveat that a bug in the verifier can compromise results. This is different: your implementation could be flawless, but it would still be susceptible to a bug in the transition function, a bug in the system being verified.

Of course, you could, as a preliminary step in your verification, check that the transition function does not have any undefined behavior for any input state. If you find that there are a lot of different undefined behaviors in C and that it's a bother to detect them all, we have a tool that we have been working on. It also answers reachability questions.

General interference between unrelated traces

Even if you have only “frank” run-time errors to fear—undefined behavior that compilers kindly translate to run-time errors, at least when optimizations are disabled—, a run-time error in one transition will still interrupt the entire dedicated analyzer. You will notice when this happens, and you can add a guard to the particular division by zero or NULL dereference, but this iterative process may end up taking as long as the value analysis for the same result. The value analysis, when encountering division by zero or NULL dereference, makes a note of it, considers the trace as terminated, and goes on to the propagation of the next trace. The end result, a list of run-time errors to worry about and a reachable states set, is the same, but it is obtained in a single pass.

There is also the issue of non-termination of the transition function. The value analysis detects cases of non-termination more or less quickly; again, when the infinite loop is detected it simply goes on to the next trace. With an ad-hoc verifier, if you expect the verification to take days, it may take days before you realize that the verifier is executing an infinite loop in its third considered transition.

Conclusion

In summary, considering the general C verification framework we already have, it looks like it wouldn't be a good investment of our time to develop a dedicated fast verifier for the competition—although it would provide an insightful datapoint if someone did.

Perhaps participating in the meeting will convince us that Event Condition Action (ECA) systems are more generally useful than our current experience has led us to believe. We could work on a verifier for them if there is a need. There is not much code to reuse from a general-purpose C abstract interpreter. There would be ideas to reuse, certainly. I will come to one of them in the next post.

- page 2 of 9 -