Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - skein

Entries feed - Comments feed

Saturday, December 31 2011

Do not use AES in a context where timing attacks are possible

Justification

There recently was a thread in the Frama-C mailing list on verifying the Rijndael cipher, standardized and better-known as AES. Nowadays, AES is mostly famous for being sensitive to timing attacks. An attacker measuring the time it takes to encrypt known plaintext with an unknown key can deduce the key (key quote: "Using secret data as an array index is a recipe for disaster").

This brings me back to an experiment with University of Minho researchers where Frama-C's value analysis was used to verify that cryptographic C functions have constant execution time. The justification and the conclusion are the same as in this blog post that had served as the starting point of that experiment.


There are two aspects to constant-time programming: checking that the execution path does not depend on secrets, and (remember the key quote above) checking that array indices do not depend on secrets. I implemented the first check as an extension of Frama-C's dependencies analysis. Manuel Bernardo Barbosa et al similarly implemented the second one, and they used both analyses to verify that cryptographic functions in the NaCl library were safe from timing attacks. I had a glance at their implementation but I had not made the effort to port it to new Frama-C releases (I assume they had not either).

Example

Consider the function f() below:

int t[10] = {0, 1, 2, 2, 1, 8, 9, 17, 54, 79};

int u[10];

int g;
int a;

void f(int n)
{
  int i;
  int *p;

  for (i = 0; i < t[n] * t[n+1] - 3; i++)
    {
      p = u + i + a;
      if (g)
	*p = i;
    }
}

Execution path dependencies analysis relies on standard dependencies analysis (option -deps), which rely on the value analysis. Say you are interested in the execution times of function f() when array t has its initial values, global variables a and g are each 0 or 1, and the argument n is between 0 and 5. You would then create the following analysis context, as you would for any verification based on the value analysis:

main(){
  a = Frama_C_interval(0, 1);
  g = Frama_C_interval(0, 1);
  f(Frama_C_interval(0, 5));
}

Analyzing this complete program for execution path dependencies:

$ frama-c -experimental-path-deps -deps share/builtin.c ep.c
...
Computing path dependencies for function f
Path dependencies of f: t[0..6]; g; n

Within the initial conditions defined by the function main() (and in particular, n between 0 and 5), the execution path depends on the initial values of t[0..6], g and n at the time f() is called. Note that variables p, a and u are not included in this list, although they are used, because the execution path does not depend on them. So if f() was a cryptographic function and variable a contained a secret, you would not have to worry that this secret can be deduced by observing through timing that the function has different execution paths depending on a.


But does this mean that you are safe from timing attacks, though? As demonstrated in the article linked first in this post, you should also worry that a malicious adversary will infer information from the array indices used in f():

$ frama-c -experimental-mem-deps -deps share/builtin.c ep.c
...
Computing mem dependencies for function f
Mem dependencies of f: t[0..6]; a; n

The above option, that I quickly re-implemented when I saw that the verification of cryptographic functions was being discussed in the mailing list, tells which of f()'s inputs influence the computation of array indices, in the same way that option -experimental-path-deps tells which of f()'s inputs influence the execution path inside f(). In this example, it tells that the contents of input variable a can leak through a cache-timing attack (because a is used in the computation of p, and later p may be dereferenced for writing).

What about AES, then?

$ frama-c rijndael.c -cpp-command "gcc -C -E -I /usr/local/Frama-C/share/frama-c/libc" -no-annot -slevel 9999 mymain.c -experimental-mem-deps -deps
...
Mem dependencies of rijndaelEncrypt: rk; Nr_0; pt; ct; in[0..15]; key[0..39];
                                     Te0[0..255]; Te1[0..255]; Te2[0..255]; Te3[0..255]

The analysis finds that there is a risk that in[0..15] (the input buffer) and key[0..39] (the key) may be leaked through cache-timing attacks. The other variables are Nr_0, the number of rounds, pt and ct, just pointers, and the Tei arrays, the S-boxes. Those aren't secrets so it doesn't matter that they are listed.

Like every other analysis in Frama-C, this analysis always makes approximations on the safe side, so we expected to find this result (since we knew the attack had been realized). The question is, is the analysis precise enough to conclude that another cryptographic function is safe? The answer is yes: the hash function Skein-256 can be shown to have an execution time that depends only on the length of the input buffer, but not of its contents.

Implementation

The option -experimental-path-deps has been available as a hidden, proof-of-concept feature inside the official Frama-C distribution roughly since the Valgrind blog post was published in 2010.

I re-implemented the analysis of memory accesses dependencies analysis demonstrated here as -experimental-mem-deps. It has been available since Frama-C version Oxygen.

Note that the feature was very lightly tested. The new option -experimental-mem-deps was written in 45 minutes and the examples in this post are the only testing it received. If you feel that automatic analyses such as these are too automatic (they don't force you to write invariants for loops in the target code, for instance), you can instead spend your time usefully by devising test suites to make sure they work as intended in all cases.

Conclusion

One must be careful with a posteriori studies. There are plenty of bugs for which, once they have been pointed out, it is tempting to say "hey, I could have found it with this or that static analysis tool! My tool finds these bugs. Everyone is so stupid not to be using it". The point is that, much like a dowser being tested, you only found the bug with your tool when someone else had told you where it was. Had you tried to find it before being told, perhaps you would have given up right in the setup phase, or you would have had so many false positives to review that you would never have gotten round to the bug, or you would have gotten round to it but would have dismissed it quickly as another false positive.

So I want to make it absolutely clear that I am not saying that the AES implementation conveniently posted in the mailing list should have been checked with Frama-C when it was written. This is a causal impossibility. However, timing attacks are a recognized family of attacks now, and cryptographic functions analyze rather well (based on experiments with AES and Skein), so Frama-C is definitely a tool that can be used nowadays to verify that new cryptographic functions are being implemented according to the state of the art.


This blog post owes to Adam Langley for describing the method in the context of Valgrind, to Manuel Bernardo Barbosa and his students for courageously writing the first version of the array indices dependencies analysis (which I am sure took them more than 45 minutes) and applying it, and to 罗婷 for posting on frama-c-discuss a version of AES that was ready to analyze.

Thursday, June 2 2011

Skein tutorial, part 7: not dead, but resting

Do you remember the Skein tutorial? It went off to a good start (it started this blog) and then was never really completed. I blame laziness. Looking back at that first post, I notice that indeed, before Boron, we were shipping software without documentation (just think! Ahem). At the time, I also thought that the extended tutorial would be over in a couple of posts, a mistake I have learnt not to make any more. September 2010, those were the days…

The story so far

The solution provided in this post was one of two promised solutions. It was in fact my own solution. While conceiving it, I was determined not to look at Skein's source code. I could claim that this was to make the verification method completely independent of the code, which is obviously good. It allows, for instance, to swap in another implementation — say, another entry in the NIST SHA3 contest — and hopefully to verify it just by relaunching the same script. The true reason I didn't want to look at the code is more likely written in bold at the top of this post.

Because I didn't want to look at Skein's source code, my solution involved the meta-argument that you can cover all possible executions of the loop

  while (Frama_C_interval(0,1))
  {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
  }

by analyzing instead the two pieces of code below, which between them do all the same things:

  while (Frama_C_interval(0,1))
  {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
  }
  arbitrarize_msg();
  Skein_256_Update(&skein_context, msg, 80);

  while (Frama_C_interval(0,1))
  {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
  }

The alternate solution

In July 2010, I was teaching at the FLOLAC summer school and I was lucky to have Josh Ko as my Teaching Assistant. He did not have my prejudices against looking at the code being verified. The alternate, and better, solution that follows is his.


We determined when we elaborated the solution based on pairing the calls to Skein_256_Update(), that the issue was the field skein_context.h.bCnt, that contained 0 before the first iteration, and alternately 16 or 32 after that. Pairing the calls to Skein_256_Update() was one way to force similar calls to be analyzed together and prevent a loss of precision caused by not knowing whether skein_context.h.bCnt was 16 or 32. Josh Ko's solution instead prevents the loss of precision by nudging the value analysis into studying these cases separately by way of an annotation inside Skein_256_Update():

--- skein.c.orig	2011-06-02 21:06:29.000000000 +0200
+++ skein.c	2011-06-02 21:06:50.000000000 +0200
@@ -159,6 +159,8 @@
 
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);     /* catch uninitialized context */
 
+    /*@ assert ctx->h.bCnt == 0 || ctx->h.bCnt == 16 || ctx->h.bCnt == 32 ; */
+
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
         {

We have seen how to provide useful annotations in this post. The above annotation is another example. With it, we can use this main():

void main(void)
{
  int i;
  u08b_t hash[HASHLEN];

  Skein_256_Ctxt_t skein_context; 
  Skein_256_Init(&skein_context, HASHLEN * 8);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

  while (Frama_C_interval(0,1))
    {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
    }
  Skein_256_Final( &skein_context, hash);
}
$ time frama-c -val -slevel 1000 -slevel-function main:0 *.c -cpp-command "gcc -m32 -C -E -I. " > log2

real	1m48.218s
user	1m45.133s
sys	0m1.325s

This single main() cover all the programs with n>=2 calls to Skein_256_Update(…, 80):

$ grep ssert log2
skein.c:162:[value] Assertion got status valid.

And that's that: no alarm is emitted, and the annotation that we provided for the value analysis is verified before being used (so it is not an assumption).

For the sake of completeness, the cases of 0 and 1 calls to Skein_256_Update(…, 80) should also be checked separately. We have done that before, so there is no need to repeat it here.

Generalizing

The advantage of the method presented in this post over the previous one is that it can more easily be generalized to different buffer lengths passed to Skein_256_Update(). The more complete verification that I last alluded to uses, among other annotations, the assertion below.

    /*@ assert                                                                                                                 
      ctx->h.bCnt ==  0 || ctx->h.bCnt ==  1 ||                                                                                 
      ctx->h.bCnt ==  2 || ctx->h.bCnt ==  3 ||                                                                                 
      ctx->h.bCnt ==  4 || ctx->h.bCnt ==  5 ||                                                                                 
      ctx->h.bCnt ==  6 || ctx->h.bCnt ==  7 ||                                                                                 
      ctx->h.bCnt ==  8 || ctx->h.bCnt ==  9 ||                                                                                 
      ctx->h.bCnt == 10 || ctx->h.bCnt == 11 ||                                                                                 
      ctx->h.bCnt == 12 || ctx->h.bCnt == 13 ||                                                                                 
      ctx->h.bCnt == 14 || ctx->h.bCnt == 15 ||                                                                                 
      ctx->h.bCnt == 16 || ctx->h.bCnt == 17 ||                                                                                 
      ctx->h.bCnt == 18 || ctx->h.bCnt == 19 ||                                                                                 
      ctx->h.bCnt == 20 || ctx->h.bCnt == 21 ||                                                                                 
      ctx->h.bCnt == 22 || ctx->h.bCnt == 23 ||                                                                                 
      ctx->h.bCnt == 24 || ctx->h.bCnt == 25 ||                                                                                 
      ctx->h.bCnt == 26 || ctx->h.bCnt == 27 ||                                                                                 
      ctx->h.bCnt == 28 || ctx->h.bCnt == 29 ||                                                                                 
      ctx->h.bCnt == 30 || ctx->h.bCnt == 31 ||                                                                                 
      ctx->h.bCnt == 32 ; */

Tuesday, January 11 2011

Value-analysis assisted verification of output variables and information flow

A previous post in the value analysis thread in this blog contained this digression:

Actually, we can also prevent function Update to keep trace of a previous input buffer, thanks to secondary analyses derived from the value analysis. One of them computes the set of locations written to by a C function. With this analysis, we can list the entire set of memory locations that Update uses as state, and check that only information that should be kept is. Perhaps there will be a blog post on this topic.

Before the Carbon release, I wasn't in a hurry to talk about this. The value-analysis assisted computations of output variables and information flow are part of the many improvements in Carbon. In the Boron release and before, when applied on the Skein codebase, they gave results that were too approximated to be useful.

Now, they just work.

One example use of these computations is provided as part of the tutorial in the Carbon version of the value analysis manual (it's section 2.5.2 there).

Rather than repeating here this addition to the tutorial, I want to reiterate on the notion of derived analysis. Although the outputs and dependencies computations rely on the results of the value analysis, they do not require you to check each of the alarms generated by the value analysis.

When a sequence of function calls has dependencies (as computed by option -deps) of hash[0..7] FROM msg[0..79]; ONE[bits 0 to 7] (and SELF), it means that if all goes well, array hash after the calls has been computed from array msg and variable ONE. Computing and taking advantage of this information is orthogonal to the verification that all always goes well during the execution of these function calls (which is done by making sure that the value analysis emits no alarm or by checking the emitted alarms with alternative techniques).

In more difficult cases, the latter may require a combination of human reflexion and of hours of computer time. But functional dependencies can be useful even when based on the results of a simpler, faster value analysis.

Tuesday, November 23 2010

Value analysis tutorial, part 5: jumping to conclusions

This post is in two parts, both of them independently good fits for the title, and still not completely without relation to each other, but that's probably a coincidence.

Methodology

In this thread, we aim at the verification of low-level properties for the functions in Skein. In the last post, I closed one line of reasoning as if the objective had been reached. But someone might object. "How do you know you can trust the verifier?", ey might say. "What have you really proved?". And, "are you asking me to believe that all cases are covered with separate analyses for 0, 1, even numbers greater than 2, and odd numbers greater than 3?".

Answers, in reverse order:

  1. yes. In C, you cannot call a function a negative number of times, so I am asking you to believe that this partition covers the "calling Update(...,80) an arbitrary number of times" part of the informal specification we settled on from the beginning.
  2. The middle question refers to the fact that there is a translation of sorts from the informal requirement or specification to the formal one. This is not unlike the translation in traditional verification from the informal requirement to executable test (or code review) plans. Yes, there are dangers here, but the dangers are not new. Both with testing or with more formal verification as we have been outlining, you can make unjustified assumptions, or forget a critical case. That weakness does not prevent replacing some tests and code reviews with formal methods, since all these techniques have the weakness. If anything, formal specifications that are slightly more expressive than individual test cases or review objectives are slightly closer to informal speech, and could be argued to have a slightly lower, and therefore slightly less risky, translation step.
  3. The first question is even less original than the middle one. We gain confidence in the verifier with tests and code reviews. They didn't suddenly stop working, you know.

Jumping ahead to the conclusion of the Skein-256 verification

All the techniques to deploy have not been exposed yet in this blog, but now that the thread gives a good idea of how absence of run-time errors can be verified in very general circumstances using the value analysis, I thought I would hint at the final results. One of the techniques to deploy is (as of November 2010) the next Frama-C release, which contains many optimizations and performance improvements, so there is no hurry for me to go into the details (yet).

Following roughly the same method, we have verified in-house that there wasn't any alarm when calling Skein_256_Update an arbitrary number of times between Skein_256_Init and Skein_256_Final, using for each call to Skein_256_Update a value n for the length, and an input buffer of length exactly n, with for each call an arbitrary, independent n between 1 and 128.

The question of what has been proved is even more acute than previously. The above informal specification could have several interpretations. Let us just say that one part of the main function used for the verification looks like this:

...
while(Frama_C_interval(0,1))
  {
    {
      int c = Frama_C_interval(0,1);
      if (c)
        {
          u08b_t msg[1];
          arbitrarize_msg(msg, 1);
          Skein_256_Update( &skein_context, msg, 1);
          continue;
        }
    }
    {
      int c = Frama_C_interval(0,1);
      if (c)
        {
          u08b_t msg[2];
          arbitrarize_msg(msg, 2);
          Skein_256_Update( &skein_context, msg, 2);
          continue;
        }
    }
    ...

In the verification described in detail until now, we re-use the same buffer for each call to Skein_256_Update. This is a small weakness in the verification plan: there could be a bug that happens only with two calls with two different buffers. For instance, Skein_256_Update could memorize the address of the buffer passed the first time, and access it on the second call, when the buffer may no longer exist. It goes to show that there is some validity to the question "what have you really proved?", but in this example, as often, the problem is with the ambiguous, informal specification and not with the formal method itself.

Using local variables as in the above C code snippet, the possibility of such a breach can be eliminated: we cannot prevent function Update to keep trace of a previous input buffer in its state, but if it then accessed it, or even just compared it to another address, this would be detected as an operation on a dangling pointer and an alarm would be emitted.

Digression: actually, we can also prevent function Update to keep trace of a previous input buffer, thanks to secondary analyses derived from the value analysis. One of them computes the set of locations written to by a C function. With this analysis, we can list the entire set of memory locations that Update uses as state, and check that only information that should be kept is. Here is a future blog post on this topic.

With the future Frama-C release, the verification takes about 5h and 150MiB of memory on a computer that was near top-of-the-line in 2006 (Intel Xeon 5150). Since we are speaking of performance, this analysis is single-threaded. With the memory footprint that it now has, there is hope that it will get a bit faster in time as a larger and larger proportion of the working set fits in lower and lower levels of memory cache. A parallel value analysis applicable to some verification plans (but untried for this one) is also at the proof-of-concept stage. Where applicable, this parallel analysis will allow to take advantage of multicore processors and computation farms.

Sunday, November 21 2010

Value analysis tutorial, part 4: one solution to second quiz

This post offers one answer to the second quiz from part 2. For context, here are links to part 1 and part 3.

The question was: how should we get rid of the last alarm below and conclude that Skein-256 is indeed safe from run-time errors when used in the conditions previously described?

lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);

The alarm appears when using the command-line:

frama-c-gui -val -slevel 100 -slevel-function main:0 *.c

Inspecting values inside function memcpy, we can see that the problematic address is again at offset 88 inside structure skein_context. Following the same procedure as in part 3, we identify this context for the alarm inside the analysis log:

...
[value] computing for function Skein_256_Update <-main.
        Called from main.c:40.
[value] computing for function memcpy <-Skein_256_Update <-main.
        Called from skein.c:171.
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
[value] Recording results for memcpy
[value] Done for function memcpy
...

Several remarks:

  1. the problematic Skein_256_Update call is the call inside the while loop (at line main.c:40). This is completely unsurprising.
  2. Function memcpy is called several times from Skein_256_Update , but the call that results in an alarm is the call at skein.c:171.

Inspecting the incriminated call site in the GUI, the address argument to memcpy is &ctx->b[ctx->h.bCnt], and the values that the analysis thinks are taken by ctx->h.bCnt at this point are 16 and 32. The length argument, n, is always 16. Since the array b inside struct skein_context is only 32 bytes, it is normal to get an alarm inside memcpy with such arguments.

Now the reason we originally unrolled a few calls to Skein_256_Update was in order to reverse-engineer what the function was doing, much as if we had inserted calls to printf inside the source code, except without the need to choose in advance the values to inspect. So let us use the GUI to check the values of skein_context.h.bCnt for the first few calls.

Using "Evaluate expression" on the first call to Skein_256_Update:

Before statement:
skein_context.h.bCnt ∈ {0; }
At next statement:
skein_context.h.bCnt ∈ {16; }

On the second call:

Before statement:
skein_context.h.bCnt ∈ {16; }
At next statement:
skein_context.h.bCnt ∈ {32; }

On the third call:

Before statement:
skein_context.h.bCnt ∈ {32; }
At next statement:
skein_context.h.bCnt ∈ {16; }

On the fourth call:

Before statement:
skein_context.h.bCnt ∈ {16; }
At next statement:
skein_context.h.bCnt ∈ {32; }

On the fifth and last call before entering the loop:

Before statement:
skein_context.h.bCnt ∈ {32; }
At next statement:
skein_context.h.bCnt ∈ {16; 32; }

Do not let yourself be distracted by the last "At next statement:" information: this one is imprecise because the "next statement" is inside the while loop. Apart from that, it is rather clear what is happening: each call after the first one changes the bCnt field from 16 to 32 or conversely. Inside the while loop, the two values end up amalgamated together. That makes it look like there may be a problem with the memcpy call. To make it clear there is no problem, we probably just need to group the calls by two. That leads us, for instance, to write the following main function to verify the safety of all sequences of an even number of calls to Skein_256_Update.

void main(void)
{
  int i;
  u08b_t hash[HASHLEN];

  Skein_256_Ctxt_t skein_context; 
  Skein_256_Init(&skein_context, HASHLEN * 8);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

  while (Frama_C_interval(0,1))
    {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
    }
  Skein_256_Final( &skein_context, hash);
}

With the command-line frama-c -val -slevel 100 -slevel-function main:0 *.c, the value analysis does not emit any alarm for this main(). The case of one call was already handled in the tutorial chapter of the value analysis documentation. The separate cases of 0 calls and of an odd number of calls larger than three are left to the reader, and this time there will not be any further refinement of this solution in a future blog post. Ask on the mailing list if you encounter a problem finishing the verification from here.

In the next episode, an alternative and more generalizable solution to the same problem. Or maybe a meta-discussion about the verification methodology being sketched out in this thread.

Friday, October 15 2010

Value analysis tutorial, part 3: answering one quiz

This is another episode in the advanced value analysis tutorial. The first episode was here and the second one here.

There were a couple of questions left unanswered at the break. This post answers the first one.

Quiz 1: continuing the study of the first five calls to Skein_256_Update, what used to cause the warning below, and what made it disappear?

lib.c:18:kernel warning: out of bounds write. assert \valid(tmp);

Value analysis messages in the log are output in order. To understand what the analysis was doing when an alarm was emitted, it is recommended to look at the surrounding entries in the log. Using the command-line frama-c -val -slevel 100 -slevel-function main:0 *.c > log on the version of main without unrolling, we get the following information about the circumstances that cause the value analysis to worry about line lib.c:18 in function memset:

...
[value] Done for function Skein_256_Update
[value] computing for function Skein_256_Final <-main.
Called from main.c:44.
[value] computing for function memset <-Skein_256_Final <-main.
Called from skein.c:212.
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
[value] Recording results for memset
[value] Done for function memset
...

Using frama-c-gui to observe the value of the troublesome pointer inside memset, we see for the main with unrolling:

tmp ∈ {{ &skein_context + [24..87] ; &cfg + [0..31] ;}}

Repeating the procedure for the main without unrolling, we see:

tmp ∈ {{ &skein_context + [24..88] ; &cfg + [0..31] ;}}

It seems that the pointer causing the problem is at an offset of 88 bytes inside skein_context. To confirm this, still while in frama-c-gui, we can select any statement of the function main and use "Evaluate expression" in the contextual menu to evaluate the expression sizeof(skein_context). The result is as follows:

[...] all the values taken by the expression sizeof(skein_context) are contained in {88; }

Note: not all C expressions can be evaluated this way, but expressions of the form sizeof(v) can, at a statement where the variable v is in scope. This is often useful to make sense of the offsets in bytes displayed by the value analysis.

So the problem is indeed that the value analysis thinks that the pointer may refer to offset 88 of skein_context when dereferenced inside memset. This address is out of range. According to the log snippet above, this happens when memset is called from Skein_256_Final. The log points us to line 212 in file skein.c, that is, the first call in that function. This call is guarded by the following conditional and comment:

if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */

      memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);

In that call, the pointer argument is & ctx->b[ctx->h.bCnt], and the values recorded at the if for ctx->h.bCnt are {0; 16; 32; }, whereas the values at the call are {0; 16; } (the analysis knows that in the case of 32, the call is not executed because the condition ctx->h.bCnt < SKEIN_256_BLOCK_BYTES is false).

If you still have the frama-c-gui with the results for the main with unrolling lying around, you can check that in that case, the values for ctx->h.bCnt are {16; 32; } at the if, and { 16; } at the call.

This leads us to the following conclusion: the imprecision that leads to the false alarm at lib.c:18 is caused by analyzing the call to memset with imprecise arguments in function Sein_256_Final. By unrolling the first few calls to Skein_256_Update, we also forced these calls to happen systematically, and that made the analysis conditions for Skein_256_Final simpler. When completing the verification with separate analyses of the omitted cases, the arguments to memset are precise again (and complementary), so the false alarm is omitted again.

In fact, if we kept investigating in the same fashion, we would find that the case ctx->h.bCnt==0 corresponds to the absence of any call to Skein_256_Update. This case is indeed covered by the while loop in the main analysis context without unrolling. The alarm we obtained when analyzing that loop does not mean there is a problem with the sequence Skein_256_Init(...); Skein_256_Final(...);, only that some peculiar values happen in this case and that analyzing this case together with the others leads to false alarms. When verifying software that has already been tested and that its authors take pride on, it makes sense to heuristically assume that alarms are false alarms when investigating them. One should however remain honest and continue to investigate until the alarm is understood—ideally, until the value analysis can confirm that there was no alarm after all.

Next time, the answer to the other quiz.

Wednesday, October 13 2010

Value analysis tutorial, part 2

In order to create a tradition of providing solutions to previous quizzes, this post is a partial answer to the question in this one about Frama-C's value analysis.

To recap: at the end of the Boron tutorial we arrive at the main function below. This function is useful as an analysis context for the verification of hashing a 80-char message with Skein-256:

#include "skein.h"

#define HASHLEN (32)

u08b_t msg[80];

void main(void)
{
  int i;
  u08b_t hash[HASHLEN];
  for (i=0; i<80; i++)
    msg[i] = Frama_C_interval(0, 255);

  Skein_256_Ctxt_t skein_context; 
  Skein_256_Init(&skein_context, HASHLEN * 8);
  Skein_256_Update(&skein_context, msg, 80);
  Skein_256_Final( &skein_context, hash);
}

The context analysis above allocates as a global array and initializes a message msg with arbitrary contents. It also allocates a structure skein_context as a local variable to be used by the library for ongoing computations.

Our goal is now to generalize the analysis to arbitrary numbers of Skein_256_Update calls. Let us replace the single Skein_256_Update call in the above context by the loop below. This loop uses Frama_C_interval(0,1) as its condition. The value analysis is guaranteed to take into account the possible results 0 and 1 for all these calls. This ensures that the results of the analysis apply to all the different execution paths we are now interested in, although these paths differ in length.

  while (Frama_C_interval(0,1))
    {
      for (i=0; i<80; i++)
        msg[i] = Frama_C_interval(0, 255);
      Skein_256_Update(&skein_context, msg, 80);
    }

Using the same command-line as in the tutorial generates warnings about possible undefined behaviors:

frama-c -val -slevel 100 *.c .../share/builtin.c > log

grep ssert log
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + n): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )1)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )2)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )3)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )4)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )5)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )6)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )7)): assert(TODO)

It is possible to increase the argument of option -slevel in order to unroll more iterations and postpone the moment in the analysis when precision is lost and alarms occur. However, since the while loop is infinite, there is no value for -slevel that lets this loop be analyzed with precision to its end.

In fact, we do not want this while loop unrolled. We want the value analysis to do standard value analysis magic and extract properties that are true at all iterations of the loop without studying each iteration individually. In other words, we want the value analysis not to unroll the while loop (but to continue unrolling the other, finite loops so as to remain as precise as it was for the single Skein_256_Update call).

One way to obtain this result is to move the for loop outside function main, so that the while loop is the only one that remains in main, and to use the option -slevel-function introduced in Frama-C Boron. This option allows "semantic unrolling" to be tuned function by function. Here, we can use it to tell the value analysis to unroll loops except inside main.

The complete analysis context becomes:

#include "skein.h"

#define HASHLEN (32)

u08b_t msg[80];

void arbitrarize_msg(void)
{
  for (int i=0; i<80; i++) msg[i]=Frama_C_interval(0, 255);
}

u08b_t hash[HASHLEN];

void main(void)
{
  int i;
  u08b_t hash[HASHLEN];

  Skein_256_Ctxt_t skein_context; 
  Skein_256_Init(&skein_context, HASHLEN * 8);
  while (Frama_C_interval(0,1))
    {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
    }
  Skein_256_Final( &skein_context, hash);
}

The new command-line to use is:

frama-c -val -slevel 100 -slevel-function main:0 *.c > log

The analysis takes a minute or so.

grep ssert log
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + n): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )1)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )2)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )3)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )4)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )5)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )6)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )7)): assert(TODO)
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);

The warnings from the previous attempt are still there, and they have a surprise guest at lib.c, line 18 (in function memset). But since the only thing we have changed since the last attempt is to prevent a loop from being unrolled, this was to be expected (you may have noticed that the analysis was a little bit faster, also as a result of not unrolling the while loop).

Now, let us try to get rid of these warnings for good. For this purpose, we need information about what is causing them. Plunging in the code with frama-c-gui is a good way to obtain some of that, but this approach lacks the ability to display values of variables for different calls to Skein_256_Update separately. Purely as an experiment, in order to have a place to click to see the state at the first, second, ..., fifth iteration, let us change the while loop to:

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);

  while (Frama_C_interval(0,1))
    {
      arbitrarize_msg();
      Skein_256_Update(&skein_context, msg, 80);
    }

Using the command-line frama-c-gui -val -slevel 100 -slevel-function main:0 *.c, the first surprise is that we have accidentally eliminated most of the alarms by hand-unrolling the while loop. Indeed, only the following now remains:

lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);

We can stick to the initial plan and observe the states after each of the first five iterations to understand what happened. This is no time to celebrate victory yet. For starters, the verification is not done until all alarms have been eliminated. Also, our investigative instrumentation has eliminated some execution paths — the executions paths with less than five iterations — that would still need to be studied separately before the verification can be called finished.

Here are the contents of variable skein_context before the first, second and third calls:

skein_context.h.hashBitLen ∈ {256; }
             .h{.bCnt; .T[0]; } ∈ {0; }
             .h.T[1] ∈ {8070450532247928832; }
             .X[0] ∈ {4072681676153946182; }
             .X[1] ∈ {5436642852965252865; }
             .X[2] ∈ {2889783295839482789; }
             .X[3] ∈ {6109786322067550404; }
             .b[0..31] ∈ UNINITIALIZED
skein_context.h.hashBitLen ∈ {256; }
             .h.bCnt ∈ {16; }
             .h.T[0] ∈ {64; }
             .h.T[1] ∈ {3458764513820540928; }
             {.X[0..3]; .b[0..15]; } ∈ [--..--]
             .b[16..31] ∈ UNINITIALIZED
skein_context.h.hashBitLen ∈ {256; }
             .h.bCnt ∈ {32; }
             .h.T[0] ∈ {128; }
             .h.T[1] ∈ {3458764513820540928; }
             {.X[0..3]; .b[0..31]; } ∈ [--..--]

Based on this information, it is now clear where the "accessing uninitialized left-value" warnings came from: there is inside the structure skein_context a buffer that only becomes initialized after the first two calls to Skein_256_Update. And these alarms are now gone because these iterations, when the buffer is not yet completely initialized, are handled separately from the latter iterations when the buffer is initialized and presumably read from at some point.

Quiz 1: continuing the study of the first five calls to Skein_256_Update, what used to cause the warning below, and what made it disappear?

lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);

Quiz 2: how should we get rid of the last alarm (that occurs in our file lib.c at line 10, function memcpy although it is caused by imprecision in the callers of that function)?

The files in their state at the end of this blog post are available here. To rewind to earlier stages of the verification process, simply erase main.c, which is the only file we have modified so far.

Thursday, September 30 2010

Progress of the value analysis tutorial

Boron's value analysis manual has the beginning of a captivating tutorial and I figure that the suspense must be unbearable. I, on the other hand, already know how it ends. But I'm not telling yet. Neener-neener! Okay, maybe later on this blog.

The new manual (with an expanded tutorial) will be released at the same time as Frama-C version Carbon, which is not immediately for various technical reasons. The new version will have some amazing performance improvements, but the Boron version is good enough for one of the steps that comes after the current tutorial ends:

Generalizing to arbitrary numbers of Update calls

As an exercise, try to verify that there cannot be a run-time error when hashing arbitrary contents by calling Update(...,80) an arbitrary number of times between Init and Final. The general strategy is to modify the C analysis context we have already written in a way such that it is evident that it captures all such sequences of calls, and also in a way such that launched with adequate options, the value analysis does not emit any warning.

The latter condition is harder than the former. Observing results (with the GUI or observation functions described in section 8.3) can help iterate towards a solution. Be creative.

Solution(s) in a few days.

Note: if you try the tutorial for the first time or on a new computer, you may also appreciate this new paragraph in the documentation:

Some compilation plaforms' headers define the names memset and memcpy in a way that make it impossible to provide your own implementation for these functions. If this happens for yours, you can try placing your own header in the analysis directory, under the name string.h


typedef unsigned int size_t;
void* memcpy(void* region1, const void* region2, size_t n);
void* memset (void* dest, int val, size_t len);