Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - value

Entries feed - Comments feed

Friday, January 17 2014

Post-conditions and names of arguments

In an ACSL post-condition, any reference to the name of one of the function's arguments is assumed to refer to the initial value of the argument.

/* ensures arg == 1; */
void f(int arg)
{
  arg = 1;
}

For instance, in function f above, Frama-C's value analysis plug-in will typically say that the status of the post-condition is unknown, because arg is taken to mean \old(arg), the value passed to f and thus untouched by the assignment. The rationale is that arg has already ceased to live when the post-condition is evaluated: the program could not observe or otherwise depend on the value of arg after the call to f() anyway. On the other hand, it is convenient to be able to relate results and arguments of the function, and this can be done with the simple syntax “arg”. For a global variable G, one would have to write “\old(G)” to refer in f's post-condition to the value of G just before the call to f. The syntax “G” in a post-condition would refer to the value of G at the end of the function.


But do not worry, if you forget the above subtlety, you can always spend twenty minutes adding debug messages to the value analysis plug-in until you finally remember that said subtlety is what is actually implemented.

Wednesday, July 31 2013

From Pascal strings to Python tuples

Quiz time

What does the program below do?

#include <stdio.h>

int main(){
  struct {
    int t[4];
    int u;
  } v;
  v.u = 3;
  v.t[4] = 4;
  printf("v.u=%d", v.u);
  return 0;
}

Two answers are “it prints v.u=4” and “it prints v.u=3”:

$ gcc t.c && ./a.out 
v.u=4
$ gcc -O2 t.c && ./a.out 
v.u=3

The correct answer is of course that the program invokes undefined behavior. It is not that we are using at any time an lvalue of the wrong type to access memory, breaking the so-called “strict aliasing rules”. It is not that v.t+4 is outside of object v. The problem is that v.t+4 is outside object v.t. So GCC does what it pleases, and when compiling with -O2, optimizes brutally:

$ gcc -S -O2 t.c && cat t.s
.LC0:
	.string	"v.u=%d\n"
…
	movl 	$3, %edx
	movl 	$.LC0, %esi
	movl	 $1, %edi
	xorl	%eax, %eax
	call	__printf_chk

Frama-C's value analysis warns for the above program:

$ frama-c -val t.c
t.c:9:[kernel] warning: accessing out of bounds index {4}. assert 4 < 4;

In general, accessing t[i] when t is an array of size 4 is only valid when i < 4, but here the index is hard-coded as 4, so line 9 is only valid when 4 < 4. That is, never: all executions that reach line 9 encounter undefined behavior there.

Second quiz, same as the first quiz

What does the program below do?

#include "stdlib.h"

typedef struct{
  int tab[1];
} ts;

int main() {
  ts *q = malloc(5*sizeof(int));
  q->tab[2]= 5;
  return 1;
}

If you guessed “invoke undefined behavior”, well done!


The program above was shown to me by facetious colleague Bernard Botella, who is hard at work analyzing Python 2.7.4's runtime in the context of a project named SafePython. The snippet above is his reduced version of a larger piece of C code he found there. The issue Bernard was having started with the type definition below, and I will let you guess the rest:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[1];

   /* ob_item contains space for 'ob_size' elements.
    * Items must normally not be NULL, except during construction when
    * the tuple is not yet visible outside the function that builds it.
    */
} PyTupleObject;

In C90, the “array of size 1 as last member of a struct” was a common idiom for implementing things like Pascal strings. And of course it is just as valid for variable-length tuples. The problem is that this is not 1990 any more: compilers now use undefined behavior as an excuse to optimize aggressively, and the idiom is no longer valid at all for either tuples or Pascal strings. On the plus side, in the C99 standard we got “incomplete types”, a safe way to implement tuples and Pascal strings:

typedef struct {
   PyObject_VAR_HEAD
   PyObject *ob_item[];
…

Conclusion

I have encouraged my colleague Bernard to report the above as a bug in Python. This kind of bug report is usually ignored, because it denounces idioms that programmers have used for a long time and that they think they understand. Just remember: if you think you can predict what the program in the second quiz does, you should be able to predict what the program in the first quiz does (or explain what is different about it).

Thursday, July 11 2013

Arithmetic overflows in Fluorine

There is a C quiz in the middle of this post, lost in the middle of all the reminiscing.

A history of arithmetic overflows in Frama-C

From the very beginnings in 2005, until after the first Open Source release in 2008, Frama-C's value analysis was assuming that all arithmetic overflows produced two's complement results. This seemed the right thing to do at the time.

Then an option was introduced to emit alarms on undefined overflows. John Regehr suggested it after testing one of the Open Source versions. The option was turned off by default. If a value analysis user turned the option on, any undefined arithmetic overflow in the program would be detected and reported as a possible error, with the same gravity as dereferencing NULL or accessing an array out of its bounds.

Later, a helpful reminder was added to the value analysis' output: in the default behavior of not emitting alarms, an informative message was emitted instead—if such an overflow was detected—about two's complement being assumed.

There was one last change in the last release, Fluorine. Actually, two changes: the name of the option for emitting alarms on undefined overflows changed, and the default value changed. The setting is now to emit alarms by default, and can be changed to not emitting them, for instance if the target code is destined to be compiled with gcc -fno-strict-overflow -fwrapv, in which case all overflows happening during execution can be expected to produce two's complement results.


One aspect remains unchanged in the above evolution: the discussion only applies to undefined overflows.

The philosophy was always to analyze programs as they were written, and not to force any change of habit on software developers. The initial choice not to warn about overflows was because we knew there were so many of these—most of them intentional—that we would be deluging the user with what would feel like a flood of false positives.

The gradual shift towards more arithmetic overflow awareness is a consequence of the fact that in C, some arithmetic overflows are undefined behavior. Compilers display increasing sophistication when optimizing the defined behaviors to the detriment of the predictability of undefined ones. To make a long story short, the “overflows produce 2's complement results” heuristic was wrong for some programs as compiled by some optimizing compilers.

In keeping with the same philosophy, “overflows” that are defined according to the C99 standard have always been treated by the value analysis plug-in with the semantics mandated by the standard. Those overflows that the standard says must have “implementation-defined” results are treated with the semantics that the overwhelming majority of compilation platforms give them (and it remains possible to model other architectures as the need arises).

A quiz

Other static analyzers may also warn for arithmetic overflows, but the philosophy can be different. The philosophy may for instance be that any overflow, regardless of whether it is defined or implementation-defined according to the letter of the standard, might be unintentional and should be brought to the attention of the developer.

In the few examples below, the goal is to predict whether Frama-C's value analysis with its new default setting in Fluorine would emit an alarm for an overflow. For extra credit, a secondary goal is to predict whether another static analyzer that warns for all overflows would warn. We assume a 32-bit int and a 16-bit short, same as (almost) everybody has.

int a = 50000;
int b = 50000;
int r = a * b;
unsigned int a = 50000;
unsigned int b = 50000;
unsigned int r = a * b;
int a = 50000;
int b = 50000;
unsigned int r = a * b;
short a = 30000;
short b = 30000;
short r = a * b;
unsigned short a = 50000;
unsigned short b = 50000;
unsigned int r = a * b;

Answers

int a = 50000;
int b = 50000;
int r = a * b;

There is an overflow in this snippet (mathematically, 50000 * 50000 is 2500000000, which does not fit in an int). This overflow is undefined, so the value analysis warns about it.


unsigned int a = 50000;
unsigned int b = 50000;
unsigned int r = a * b;

The multiplication is an unsigned int multiplication, and when the mathematical result of unsigned operations is out of range, the C99 standard mandates that overflows wrap around. Technically, the C99 standard says “A computation involving unsigned operands can never overflow, …” (6.2.5:9) but we are using the word “overflow” with the same meaning as everyone outside the C standardization committee including Wikipedia editors.

To sum up, in the C99 standard, overflows in signed arithmetic are undefined and there are no overflows in unsigned arithmetic (meaning that unsigned overflows wrap around).


int a = 50000;
int b = 50000;
unsigned int r = a * b;

The multiplication is again a signed multiplication. It does not matter that the result is destined to an unsigned int variable because in C, types are inferred bottom-up. So the value analysis warns about an undefined overflow in the multiplication here.


short a = 30000;
short b = 30000;
short r = a * b;

There is no overflow here in the multiplication because the last line behaves as short r = (short) ((int) a * (int) b);. The justification for this behavior can be found in clause 6.3.1 of the C99 standard about conversions and promotions (the general idea is that arithmetic never operates on types smaller than int or unsigned int. Smaller types are implicitly promoted before arithmetic takes place). The product 900000000 does fit in the type int of the multiplication. But then there is a conversion when the int result is assigned to the short variable r. This conversion is implementation-defined, so the value analysis does not warn about it, but another static analyzer may choose to warn about this conversion.


unsigned short a = 50000;
unsigned short b = 50000;
unsigned int r = a * b;

Perhaps contrary to expectations, there is an undefined overflow in the multiplication a * b in this example. Right in the middle of the aforementioned 6.3.1 clause in the C99 standard, on the subject of the promotion of operands with smaller types than int, the following sentence can be found:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.

All values of a 16-bit unsigned short fit a 32-bit int, so the third line is interpreted as unsigned int r = (unsigned int) ((int) a * (int) b);.


Incidentally, things would be completely different in this last example if int and short were the same size, say if int was 16-bit. In this case the third line would be equivalent to unsigned int r = (unsigned int) a * (unsigned int) b; and would only contain an unsigned, innocuous overflow.

Wrapping up

In Fluorine, the option to activate or deactivate the emission of these undefined arithmetic overflow alarms is called -warn-signed-overflow (the opposite version for no alarms being -no-warn-signed-overflow). I felt that providing this piece of information earlier would have rendered the quiz too easy.

Although Frama-C's value analysis adheres to the semantics of C and only warns for undefined overflows, it is possible to use Frama-C to check for the other kinds of overflows by using another plug-in, Rte, to automatically annotate the target C program with ACSL assertions that express the conditions for overflows. Note that that post pre-dates the Fluorine release and is written in terms of the old options.

Wednesday, October 10 2012

RERS 2012 competition: our solutions for problems 1-9

Previously on this blog

Although it was so brief that you may have missed it, I previously mentioned here the 2012 RERS Grey Box Challenge, an interesting competition where the questions involve programs in C syntax.


I pointed out that some questions were about the reachability of assertions in the programs. As it is documented, Frama-C's value analysis can guarantee that some assertions are not reachable. When its authors are using it, they can ascertain that no approximation is taking place, and that for this reason, any assertion that the analysis diagnoses as perhaps reachable is indeed reachable for some sequence of inputs. Because of the way it work, the analysis is unable to exhibit the sequence of inputs, but one such sequence definitely exist.

We then moved on to properties expressed as LTL formulas. Some of the LTL formulas in the competition expressed reachability properties. It was easy to diagnose those by applying Frama-C's value analysis to an instrumented program. The other properties were more difficult liveness properties. With more instrumentation, involving either a transition counter or a non-deterministic predecessor state, it was possible to diagnose these too using the value analysis.

Colleagues and I have obtained solutions for all properties in the initial problems 1-9 of the competition. More information about these solutions is available in the last part of this post. First, now that the exercise is done, I would like to revisit its benefits.

Features gained, bugs fixed

New built-in Frama_C_interval_split()

A key part of the main() function used for the verification of any kind of property in the competition was:

  input = unknown_int();
  /*@ assert input == 1 || input == 2 || input == 3 ||
                input == 4 || input == 5 || input == 6 ; */

These three lines are intended to replace the following one in the original main() function provided by the organizers. Their line works for interactive sessions where the operator inputs number in the range 1..6.

        scanf("%d", &input);        


The call to unknown_int() could also have been Frama_C_interval(1, 6). Then the value analysis would be able to tell that the assertion is not an assumption, only a hint. Regardless, the assertion is there, together with option -slevel, to make the analyzer study separately what happens in executions where a different value is read for input.


This is a classic trick used in many earlier studies. It is mentioned in the study of the Skein-256 hash function and in nearly all others since. Still, these assertions do get tiring to write when many cases need to be listed. Until now, there was no shortcut (I have been using Emacs macros myself).

Enter Frama_C_interval_split(l, u). It does the same thing that Frama_C_interval(l, u) does, but it automatically causes the individual values between l and u inclusive to be propagated separately, without need for a laborious ACSL assertion. The Frama_C_interval_split() built-in is available in Oxygen (I think), and since it is a true value analysis built-in, the user does not need to remember to include file builtin.c to make it accessible. A simple prototype int Frama_C_interval_split(int, int); will do.

The informational message “unrolling up to ... states” could be off

If you have used option -slevel (and as previously said, if you use Frama-C's value analysis at all, you should definitely use this option from time to time), you know that it sometimes logs a message to tell how much superposition has been done on a same program statement. It looks like:

...
[value] Semantic level unrolling superposing up to 100 states
...
[value] Semantic level unrolling superposing up to 200 states
...

The algorithm that takes care of this message was like this:

  target := 100;
  if new number of states attached to the statement ≥ target
  then
    print "Semantic level unrolling superposing up to " target " states"
    target := target + 100
  endif

The intention was that the value displayed would indicate the maximum number of superposed states to the nearest hundred.


Suppose the first wave of propagated states to reach a statement contains a thousand of them. The algorithm displays “superposing up to 100 states”, and updates the target to 200. If the second wave contains 2500 more states, the algorithm then displays “superposing up to 200 states” and updates the target to 300. And so on. If states keep arriving in large numbers, variable target gets irremediably behind.


This had not been noticed until this competition, but in model-checking mode, manipulating tons of states, target never gets a chance to catch up and the number displayed can be much lower than the number of states actually superposed.


This bug is present in Frama_C Oxygen. It is fixed in the development version.

An unnamed option was tested more thoroughly

Last but proverbially not least, thanks to this competition, the value analysis option that caches previous analyses of a function call in order to re-use them without loss of precision has received more testing. Who knows, someday we may even trust it enough to reveal its name.

Conclusion

The next section describes our solutions. If you did not at least attempt the competition yourself, it is unlikely that you will find it interesting: you should stop reading now. Here is my personal conclusion before you go: case studies are fun, and model-checking is a fun sub-field of computer science. At the same time, a piece of software is not an automaton: if you try to specify one as if it was the other you will certainly make mistakes and verify something other than you intended. I think that our experience trying to participate in the competition demonstrates that. If you are not convinced, please try to answer the competition's questions for yourself, and then, and only then, read the next section.

Our solutions to the 2012 RERS challenge

Description

The reachability of assertions is straightforward to answer with the value analysis, so I won't describe that. Here is an archive containing our answers for the LTL properties part. The archive contains mostly everything. Many files there were generated from others. In particular, the *.log files are analysis log files that took about one core-month to generate on a 2.66 GHz Intel Xeon X5650 workstation. The last file to be generated was “results”. It contains the predicted statuses of properties as interpreted with the conventions that follow.


A specific LTL property for a specific problem is checked by putting together the problem's transition function (function calculate_output() and its callees), a file buchimgt.c specific to the problem (this file contains the definition of the predecessor state and functions that manipulate it), a file main.c that contains a generic main loop, and a file such as output Z occurs after output U until output W.c that is specific to a property (but independent of the problem to which it is linked).

The problem-specific files are generated by the script genallproblemfiles.

The property-specific files such as output Z occurs after output U until output W.c were generated from templates such as output _ occurs after output _ until output _.c. The templates were hand-crafted by my colleague Sébastien Bardin to work together with the rest of the instrumentation.

The current status of the LTL property is maintained by an automaton designed for this kind of work (called a Büchi automaton). The generic main() function describes our interpretation of the problem. In pseudo-code:

forever
   read input 1..6
   compute output from input and current state
   compute transition in Büchi automaton from input and output
   if the current trace can no longer fail to satisfy the property
      prune 
   if the current trace can no longer satisfy the property
      display current state for verification
      abort analysis
   non-deterministically update predecessor state

The actual C file is designed so that the value analysis will remain complete as well as sound when analyzing it with high -slevel. A secondary objective is for the analysis to be as fast as possible despite current limitations. Readability only ranks third as an objective.

Still, regardless of the strange structure of the program, it is basically C. Any C programmer can read the main() and understand what is being verified. The only non-standard function is Frama_C_interval_split(). The informal description of Frama_C_interval_split() is that the function returns a value between the bounds passed to it and that the results provided by the analyzer hold for all values that it can return. If a simple Büchi automaton is plugged in that fails when output Z is emitted, it is relatively easy to understand on the basis of this main() function what we mean by “Z is emitted” and that Z can never be emitted if the analysis passes without the automaton raising a flag. This is how we can verify, as an example, the LTL property G !oZ.


As you can see on the pseudo-code, the Büchi automaton is updated with an input and the corresponding computed output, which are considered simultaneous. When calculate_output() fails to return an output because of a failed assertion, no pair (input, output) is presented to the automaton. To illustrate, consider the hypothetical property "output X occurs before input A". This property is weird, because each input A-F can happen at each instant, including the first one. It may seem that the property is tautologically false. With our interpretation, this property would be true if the system always failed the instant it read A until it had output X. These inputs would not count because the Büchi automaton would not see them.

The above is what we considered the most reasonable interpretation of what the competition organizers intended, at the time we sent our answers. On October 1, the organizers published example solutions that made us realize that they intended something else, and we had to retract our participation.

Differences with the official interpretation, and remaining unknowns

As the challenge period was coming to a close, organizers published solutions to additional problems 10 and 11 (the challenge was supposed to close at the end of September. The solutions illustrating the semantics were published on October 1 and the deadline extended).

One can only assume that these solutions also define the semantics of LTL formulas for problems 1-9, since no example is available for these. Unfortunately, there are subtle differences between the two families of programs.


Outputs corresponding to invalid inputs

The transition functions in problems 1-9 can return -1 or -2. They only return -2 when the rest of the program state was unchanged. The organizers communicated to us that traces in which -2 is emitted indefinitely should be ignored:

The same holds by the way for invalid inputs, i.e., calculate_output returning 2 [sic]. They are not meant to be part of the traces that LTL properties talk about - a liveness property such as F oW hence does not become false alone by infinitely often executing an invalid input.

I did not see this last point made explicit on the website.

For problems 1-9, since the program state is unchanged when -2 is returned, it seems that the easiest way to comply is to treat -2 as a fatal error that interrupts the trace. We do not lose much of importance in doing so because any trace with -2 outputs can be shortened into a trace without these outputs. We do lose some inputs during such a shortening, so it is still slightly invasive and we would like to be sure that this is what the organizers intended. Indeed, the main() function provided by the organizers does seem to treat -1 and -2 differently, and does call the inputs that lead to -2 “invalid”.

        if(output == -2)
                fprintf(stderr, "Invalid input: %d\n", input);
        else if(output != -1)
                        printf("%d\n", output);

The code above matches more or less the “if an unexpected input event is provided, an error message is printed and the ECA system terminates” statement on the competition's website. The code above does not stop on -2 but it seems clear that the “unexpected input” from the explanation is the -2 from the code and that, as informally stated, it interrupts the current trace.


You might expect programs 10-13 to follow the same pattern, but it turns out that instead of sometimes -1 and sometimes -2, they always return -2, both when the program state has been changed by the transition function and when it hasn't. You might still hope to interpret -2 as “error”, but in programs 10 and 11 it happens in the middle of traces that are given as example for the questions on the reachability of assertions. The solution page says:

error_46 reachable via input sequence
[E, A]

And, when you try it:

$ gcc Problem10.c 
$ ./a.out 
5
Invalid input: 5
1
Assertion failed: (0), function calculate_output, file Problem10.c, line 66.
Abort trap

So it does not look that -2 should be treated as an error after all. In the above organizer-provided trace, the error caused by input 5 (E) does not make error_46 unreachable (actually, I have it on good authority that error_46 is unreachable when, in the transition function, return -2; is treated as a trace-interrupting error).


In conclusion, for problems 1-9, we treated -1 as just another output letter and -2 as a trace-interrupting error. This does not correspond to what the organizers did in their example solutions to problem 10. We still do not know what the organizers intended. The transition function returning -2 instead of -1 in problems 10-13 seems particularly inscrutable when the organizers have previously told us by e-mail to ignore the traces with finitely many non-(-2) outputs.

Our aforelinked solutions were computed with this convention. We provide them in case someone wants to use the same convention and compare their results to ours.


Are inputs and outputs simultaneous or are they different events?

Again, we have no idea. In our modelization, the output happens at the same time as the input it was computed from.

But in example traces provided by the organizers, it looks like they are distinct events!! An example trace is [iD, oV, iB] ([oY, iC, oV, iB])*.

But if input and output are separate events, why is the first LTL formula for problem 10 expressed as (! iA WU (oU & ! iA)) ? The atom (oU & ! iA) makes no sense, because if the event is any sort of output event (U or another), then it cannot be an input event. The atom (oU & ! iA) is just equivalent to oU. All LTL properties that talk about inputs and outputs in the same instant are similarly nonsensical.

In our solutions, influenced by the shape of the competition's LTL formulas which talk about input and output in the same instant, we considered that input and corresponding output are simultaneous. Both are passed to the Büchi automaton at the time of computing a transition. An input can happen together with the output -1. If the transition function aborts because of an assertion or returns -2, then the input is not seen by the Büchi automaton (in the case of the -2 output, this would be easy to change by modifying the main() function. The trace could be interrupted just after the Büchi automaton transition instead of just before).


Is there any reachability LTL property at all in the competition?

The example solutions all provide infinite traces as counter-examples. This is consistent with the explanation “an LTL property holds for a certain ECA program if and only if all possible runs which do not lead to an error satisfy the LTL property”, but it is a very strange way to approach reachability questions (which indeed are no longer reachability questions at all). To answer a LTL question that would ordinarily have been a reachability question, such as “output W occurs at most twice”, in this setting, one must, when W has been encountered three times on a trace, wonder whether this trace is the prefix of at least one infinite trace. If there is a sequence of inputs that keeps the trace alive, then the trace is a counter-example. If all inputs that can be added to extend to this prefix eventually lead to an error, even if it is at a very far horizon, then the trace is not a counter-example (remember not to take into account the trace that indefinitely outputs -2 when trying to answer this sub-question).

In short, in this setting, no LTL formula ever expresses a reachability property. It does not prevent us to handle these LTL properties (Sébastien assures me that our method theoretically handles all liveness properties. We certainly handle those that are already liveness properties without the weird “all possible runs that do not lead to an error” quantification).


This discussion gives me an idea for a powerful static analysis tool. I will call it Marvin. You ask it “Is this software safe? Can this dangerous angle be reached?” and it answers “Who cares? The Sun is going to blow up in five billion years. Good riddance if you ask me. Oh god, I'm so depressed…”

Thanks

As the initiator of this attempt to participate to the competition, I would like to thank my colleagues Virgile Prevosto, Boris Yakobowski and Sébastien Bardin, from whom I personally learnt a lot about various aspects of software verification; my warmest thanks are for the organizers of the 2012 RERS Grey Box Challenge.

Friday, September 21 2012

A value analysis option to reuse previous function analyses

A context-sensitive analysis

Frama-C's value analysis is context-sensitive.

This means that when a function f2() is called from a caller f1(), function f2() is analyzed as many times as the analyzer goes over f1(). Function f2() is analyzed each time with a different program state—the program state corresponding to the specific call. If f2() is called from within a loop, the analyzer may analyze it many times.

This is for the worthy cause of precision. Any attempt to summarize f2(), so that it can be analyzed once and for all, can only result in less precise analysis results.

An example

Consider the following program:

int a1 = 1;
int a2 = 2;
int a3 = 3;

int p1 = -1;
int p2 = -1;
int p3 = -1;

void calculate_output(void)
{
  if (a1 == 1 && a2 == 2 && a3 == 3)
  {
    a1 = 2;
    return;
  }
  if (a1 == 2 && a2 == 2 && a3 == 3)
  {
    a1 = 1;
    return;
  }
}

main()
{
  calculate_output();
  p1 = a1;
  p2 = a2;
  p3 = a3;
  Frama_C_dump_each();
  calculate_output();
  p1 = a1;
  p2 = a2;
  p3 = a3;
  Frama_C_dump_each();
  calculate_output();
  p1 = a1;
  p2 = a2;
  p3 = a3;
  Frama_C_dump_each();
  calculate_output();
}

For the reason stated above, function calculate_output() is analyzed four times when this program is analyzed with the command frama-c -val:

$ frama-c -val calc.c
...
[value] computing for function calculate_output <- main.
        Called from calc.c:25.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:30.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:35.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:40.
...
[value] Values at end of function main:
          a1 ∈ {1}
          p1 ∈ {2}
          p2 ∈ {2}
          p3 ∈ {3}
...

An unnamed option

If you look at the program closely, you may notice that the third call and the first call (and respectively the fourth and second call) have a lot in common. For each of these pairs the analysis follows the same paths inside the function and change the program state in the same way. This is because although the states differ in variables p1, p2, p3, they are identical for the variables a1, a2, a3 that are the only variables read.

There is an undocumented (and indeed, unfinished) value analysis option to take advantage of previous analyses of a function call without loss of precision. It only reuses calls in circumstances similar to those that apply to the third and fourth call to calculate_output() above. When this option is enabled, Frama-C, after analyzing a function call, computes the sets of locations that were effectively read and written for that specific call and records these inputs and outputs, as well as the program state before and after the call, in a cache for the purpose of later re-using the analysis just done.

With that option, the log output during the analysis becomes:

[value] computing for function calculate_output <- main.
        Called from calc.c:25.
...
[value] computing for function calculate_output <- main.
        Called from calc.c:30.
...
calc.c:35:[value] Reusing old results for call to calculate_output
...
calc.c:40:[value] Reusing old results for call to calculate_output
...
[value] Values at end of function main:
          a1 ∈ {1}
          p1 ∈ {2}
          p2 ∈ {2}
          p3 ∈ {3}

History and application to the RERS challenge

The option alluded to in this post was not implemented for the purpose of the RERS 2012 competition. It was implemented earlier in the context of a large case study where it contributed to speed up the analysis greatly. A lot of work would be involved in making this option work well in all contexts (and, actually, in making sure that derived analyses that plug into the value analysis are not confused by bits of analysis being reused).

Still, although it is unfinished, the mysterious option is finished enough to be useful in the context of the RERS competition. It is particularly useful when determining the status of liveness properties such as (F oW). Whether infinite execution traces are detected with a counter or with a quantum superposition of all previous states, the extra program state introduced by the instrumentation does not influence the behavior of function calculate_output(). In that context, it is a huge gain to be able to re-use the analysis of calculate_output(), which would otherwise, for each state, have to be duplicated for all possible predecessors.


Boris Yakobowski invented and implemented the option this post refers to, without which the hardest properties in the RERS challenge would be intractable.

Friday, August 24 2012

On writing a dedicated model-checker for the RERS competition

In recent posts, I have shown that Frama-C's value analysis could answer many reachability questions, and some questions that weren't originally phrased as reachability questions, about the programs in the RERS competition.

If you are vaguely familiar with the internals of Frama-C's value analysis, and if you tried analyzing some of the competition programs yourself, you may have noticed that these analyses use only a small fraction of the plug-in's capabilities. The analyzer is only ever propagating abstract states that correspond to singletons. It does juggle with many program states, but the programs here have small states that are encoded in just a few variables (the analyzer would have been able to manipulate the states encoded on a much larger number of variables and would efficiently share in memory the values common to several of the explored states). There are no bit-fields, no structs with padding (both of which might make identical states look different if carelessly handled). The programs obviously do not execute any undefined behavior for lack of any operation that might introduce them. There is a single outermost loop. There is no arithmetic at all.

In favor of the general verifier

A specialized verifier that was designed for just these programs would have a huge opportunity to do a better job on them. On the other hand, the advantage of working on a more general verifier is that it is useful for more tasks. This enables to spend more time improving it. Some of the improvements enhance the analysis of many programs, including the specific programs built only from assignments, equality tests and conditionals considered in the competition. Some of these improvements are too sophisticated to justify implementing in a verifier that only handles programs with assignments, equality tests and conditionals, because such a verifier will never be usable to find that the SHA-3 candidate Skein's reference implementation does not exhibit undefined behavior, that AES may be susceptible to timing attacks (but Skein isn't), where a couple of bugs are in an old undocumented piece of control software, that there is a subtle memory overflow in compression library QuickLZ, or that units of code have the data and control flows mandated by specification.

What a dedicated verifier might look like

In the particular case of these programs, the same propagation done by the value analysis could be done in C, by a program that would incorporate the transition function directly and execute it as compiled code. This would be much more efficient than what the value analysis does, and in this particular case, it would give the same results. From experiments interpreting Csmith programs, the value analysis slowdown with respect to compiled code can be expected to be of the order of 10000.

Accumulating reachable states

In order to reproduce what Frama-C's value analysis does, a dedicated verifier would need to store states that have already been visited, and to be able to recognize, after applying the transition function once more, whether the obtained state was one that was already visited.

In this particular case, this could be done in constant time with a hashset. Note, however, that it is only possible to compute a hash in this specialized case because all states are “singleton” states. If some states represented several values at once, e.g. a1 ∈ {1; 2} a2 ∈ {0; 1}, the good criterion would then be whether the newly computed state is included in one of the previously propagated states. Testing inclusion in one of the previous states cannot be done in constant time with a hashset (hashsets only allow you to test whether the new state is equal to an existing state, and you need to be careful to use compatible equality and hash functions).

Frama-C's value analysis uses a data structure that is generally as efficient as hashsets when the manipulated states are singletons, and that often remains efficient when testing inclusion in one another of states that are not singletons.

Storing states that remain to be propagated

A small thing: the propagation algorithm also requires a workqueue. Any propagation order will do (Frama-C's value analysis propagation order is not specified either), but since C comes with so few data structures, I just thought I would mention it. For a dedicated verifier in C, one would need to find some linked list implementation or write eir own. Frama-C's value analysis may interpret C slower than the code can be executed once compiled, but it comes with ready to use data structures for the reached set and for the workqueue. One needs not even know they are there to use the analyzer.

Specific drawbacks

Obviously, this ad-hoc verifier could be expected to be must faster than the value analysis. This makes the experiment tempting. What prevents me from starting work on it is the lack of generality of the resulting tool. Some examples follow.

Undefined behavior

Suppose that you had implemented such a specialized verifier for the competition's program, and that you were looking for more transition functions written in C that the same verifier could apply to. You would certainly find some, but would you in general be certain that the transition function never exhibits undefined behavior (not for the first transition, and not for the transition from any reachable state to a successor)? If one isn't certain that the transition function does not cause undefined behavior, from a formal verification point of view, the tool is worthless. An undefined behavior can cause anything to happen. In particular, an out-of-bounds memory access in the transition function can mess up the verifier's reached set or workqueue, and compromise the results of the verification.


Any automatic verifier is susceptible to the caveat that a bug in the verifier can compromise results. This is different: your implementation could be flawless, but it would still be susceptible to a bug in the transition function, a bug in the system being verified.

Of course, you could, as a preliminary step in your verification, check that the transition function does not have any undefined behavior for any input state. If you find that there are a lot of different undefined behaviors in C and that it's a bother to detect them all, we have a tool that we have been working on. It also answers reachability questions.

General interference between unrelated traces

Even if you have only “frank” run-time errors to fear—undefined behavior that compilers kindly translate to run-time errors, at least when optimizations are disabled—, a run-time error in one transition will still interrupt the entire dedicated analyzer. You will notice when this happens, and you can add a guard to the particular division by zero or NULL dereference, but this iterative process may end up taking as long as the value analysis for the same result. The value analysis, when encountering division by zero or NULL dereference, makes a note of it, considers the trace as terminated, and goes on to the propagation of the next trace. The end result, a list of run-time errors to worry about and a reachable states set, is the same, but it is obtained in a single pass.

There is also the issue of non-termination of the transition function. The value analysis detects cases of non-termination more or less quickly; again, when the infinite loop is detected it simply goes on to the next trace. With an ad-hoc verifier, if you expect the verification to take days, it may take days before you realize that the verifier is executing an infinite loop in its third considered transition.

Conclusion

In summary, considering the general C verification framework we already have, it looks like it wouldn't be a good investment of our time to develop a dedicated fast verifier for the competition—although it would provide an insightful datapoint if someone did.

Perhaps participating in the meeting will convince us that Event Condition Action (ECA) systems are more generally useful than our current experience has led us to believe. We could work on a verifier for them if there is a need. There is not much code to reuse from a general-purpose C abstract interpreter. There would be ideas to reuse, certainly. I will come to one of them in the next post.

Thursday, August 23 2012

Technical interlude

The current series of posts is pretty technical, huh? Here is a refreshing diversion in the form of a technical post that also touches on existential questions. What is this blog for? We don't know.

Best solutions, or solutions anyone could have found

If you have been following the current series of posts, you may have noticed that a recurring dilemma is whether the goal is to demonstrate the capabilities of the software tool under consideration (Frama-C's value analysis plug-in) when pushed to its limits by its authors, or to demonstrate what anyone could do without being a creator of the tool, taking advantage of Linux, Mac OS X and sometimes Windows packages for the software, and, most importantly, having only the available documentation to understand what the software makes possible.


The general goal of the blog is to empower users. In fact, much of the aforementioned documentation is available as posts in the blog itself. These posts sometimes describe techniques that were undocumented until then, but describe them so that they wouldn't be undocumented any more. Some of the posts even describe techniques that weren't undocumented at the time of writing. As my colleague Jean-Jacques Lévy would say, “repetition is the essence of teaching”.


However, I do not expect that most software tool competitions are about measuring the availability and learnability of tools. For instance, the SV-COMP competition explicitly forbids anyone other than the authors of the tool to participate with it:

A person (participant) was qualified as competition contributor for a competition candidate if the person was a contributing designer/developer of the submitted competition candidate (witnessed by occurrence of the person’s name on the tool’s project web page, a tool paper, or in the revision logs).

Ostensibly, the tools are the participants in the competition. Involving the authors is only a mean to make the competition as fair as possible for the tools. Some colleagues and I have expressed the opinion that, for static analysis, at this time, this was the right way:

In general, the best way to make sure that a benchmark does not misrepresent the strengths and weaknesses of a tool is to include the toolmakers in the process.


Some competitions on the other hand acknowledge that a participant is a person(s)+tool hybrid, and welcome entries from users that did not create the tools they used. This is perfectly fine, especially for the very interactive tools involved in the cited competition, as long as it is understood that it is not a tool alone, but a hybrid participant, that is being evaluated in such a competition. Some of the participating tools participated in several teams, associated to several users (although the winning teams did tend to be composed of toolmakers using their respective tools).


It is not clear to me where on this axis the RERS competition falls. It is a “free-style” competition. The entries are largely evaluated on results, with much freedom with respect to the way these were obtained. This definitely can also be fun(*).

The competition certainly does not require participating tools to be packaged and documented in such exquisite detail that anyone could have computed the same results. As such, the best description to write is the description of what the authors of the value analysis can do with it with all the shortcuts they trust themselves to take. If some of the questions can be solved efficiently using undocumented features, so be it. It is the free-style style.

But the mission of the blog remains to document, and showing what we have done without explaining how to reproduce it sounds like boasting. I am torn.


(*) An untranslatable French idiom, “bon public”, literally “good audience”, used in an adjective position, conveys a number of subtle, often humorous nuances. I think this applies to me and software competitions. They just all seem fun to me.

A necessary built-in

A value analysis built-in whose usefulness, if it were available, became apparent in the last post was a built-in that halts the analyzer.

Several analysis log snippets in the previous post show a ^C as last line, to indicate that the log already contains the answer to the question being investigated, and that the user watching the analysis go by can freely interrupt it then.

I cheated when editing the post, by the way. I am not fast enough to interrupt the analysis just after the first conclusive log message. But I did interrupt these analyses, which otherwise might have gone on for a long time for no purpose.

I thought I would enjoy my colleagues' horrified reaction so I told everyone about my intention of writing a value analysis built-in, like some already exist for memcpy() or memset(), for halting the analyzer. Not halting the currently considered execution, mind you. There is /*@ assert \false; */ for that, and if you need this functionality available as a C function, you can put /*@ assert \false; */ inside your own function, like I did for assert() earlier when prepping the competition programs for Frama-C.


Clearly, such a built-in does not have a contract, its effects are meta-effects that transcend the framework (in what would, for these colleagues, be the bad sense of the word).

A time-saver

Today I set out to write this built-in, because I started to get serious about the competition and it would be convenient to just instrument the program with:

if ( /* current state identical to predecessor state /* )
{
  Frama_C_show_each_cycle_found(1);
  Frama_C_halt_analysis();
}

instead of having an operator, or even a shell script, watch the log for the Frama_C_show_each_cycle_found message and interrupt Frama-C then.


Soon after opening the file and finding my place, I realized that this feature was already available. I am telling you this because this blog is for the benefit of transparency, for documenting things and for empowering users, but mostly because I want to see my colleagues' horrified reaction when they read about this trick. Actually, I would not use it if I were you. It might cease to work in a future version.

if ( /* current state identical to predecessor state /* )
{
  Frama_C_show_each_cycle_found(1);
  Frama_C_cos(0.0, 0.0);
}

You see, Frama-C value analysis built-ins, such as Frama_C_cos(), do not have an intrinsic type. You can include a header that will give them a type if you wish:

double Frama_C_cos(double x);

Otherwise, the old C89 rules for missing function prototypes apply. During parsing, a type is inferred from the types of the arguments at the call site.

Only if and when the analysis actually reaches the Frama_C_cos(0.0, 0.0) call, the analyzer will realize that it is passed arguments that are incompatible with the OCaml function that implement the built-in. An exception will be raised. Since there is no reason to try to recover from a built-in misuse, this exception is caught only at the highest level. It stops the entire analyzer (it marks the entire analysis results as unreliable, as it should, but of course it does not retroactively unemit the Frama_C_show_each_cycle_found message in the log that says a conclusion has been reached).


The result looks like this, this time without cheating:

[value] Called Frama_C_show_each_cycle_found({1})
...
[value] computing for function any_int <- main.
        Called from Problem6_terminateU_pred.c:9516.
[value] Done for function any_int
[value] Semantic level unrolling superposing up to 1400 states
[value] user error: Invalid argument for Frama_C_cos function
[value] user error: Degeneration occured:
        results are not correct for lines of code
        that can be reached from the degeneration point.
[kernel] Plug-in value aborted because of invalid user input.

This is convenient for timing the analyzer as it tackles some of the harder problems. In the case above, the time was 5m35s answering a liveness question about Problem 6. It would have been a pain to watch the log for that long, fingers poised to interrupt the analyzer.


That's all for this interlude. Expect exciting timings in the next post. There may also be talk of undocumented features that transcend the problem, in the good sense of the word.

Wednesday, August 22 2012

Proving (F oW) true or false: alternative method

In the context of the RERS 2012 competition, the previous post points out that value analysis option -obviously-terminates can be used to prove the termination of an instrumented program, created such that its termination implies that the property (F oW) is true of the original competition program. Unfortunately, this only works in one direction. If the property (F oW) does not hold of the original competition program, this approach cannot be used to prove that. The previous post continued to offer a method, involving a counter, that works in both directions.

Preliminary remark regarding the counter method

The second method proposed in the previous post requires a state count to use as bound. I suggested to use the number of reachable states of the original (uninstrumented) program, obtained by running Frama-C's value analysis as a sound and complete abstract interpreter.

I would like to point out that instead of the state count of the original program, it is possible to use the state count of the program instrumented to stop when W is emitted. This number is by definition lower. On the other hand, the number of states of the uninstrumented program can be computed once and for all, and then be used for all formulas of the form (F oW) that apply to that program in the competition. In fact we can obtain that number for free when we check the reachability of assert(0); statements for the competition. So there is a trade-off there.

Applying the method of last post

Let us now consider Problem3.c (small-hard). One of the properties to evaluate is (F oY). The first method described in the previous post does not conclude rapidly, so there is reason to suspect that perhaps the property is false for this program. But if this is going to be our answer in the competition, we had better establish that the property is definitely false, that is, there exists an infinite execution along which Y is not emitted.

Instrumenting with a counter

To establish that the property is false using a counter, we instrument Problem3.c thus:

--- Problem3.c	2012-08-08 10:49:53.000000000 +0200
+++ Problem3_terminate_at_Y_with_counter.c	2012-08-22 13:58:30.000000000 +0200
@@ -1,9 +1,8 @@
 void assert(int x)
 {
   if (!x)
     {
-      Frama_C_show_each_assert_reached();
       /*@ assert \false; */
     }
 }
 
@@ -1649,15 +1648,16 @@
 	    } 
 	    if((((((((!(a26==1)&&(a6==1))&&!(a27==1))&&(a12==1))&&(a3==0))&&(a11==1))&&(a5==1))&&(a18==1))){
 	    	error_2: assert(0);
 	    } 
-	    return -2; 
+	    /*@ assert \false ; */
 	}
 
 int main()
 {
     // default output
     int output = -1;
+    int counter = 0;
 
     // main i/o-loop
     while(1)
     {
@@ -1668,6 +1668,10 @@
 
         // operate eca engine
         output = calculate_output(input);
 
+	/*@ assert output != y ; */
+
+	counter++;
+	Frama_C_show_each_counter(counter);
     }
 }

Obtaining a bound

In order to obtain a bound, we analyze the original program, using the value analysis as a model-checker:

$ frama-c -val -slevel 100000000 Problem3.c
...
[value] Semantic level unrolling superposing up to 200 states
...

This message means there are up to a number between 200 and 300 states attached to a single statement during the analysis of the program. That statement is probably the statement just after a new input has been picked non-deterministically between 1 and 6, and the number therefore six times higher than the number of states that interests us. The abstract interpreter also distinguishes states program that differ only for variable output, again making the bound displayed in the “superposing up to” messages higher than the number we mean. But let us not try to compensate for that, just to be on the safe side.


I kid, I kid. Using the “superposing up to” message is not rigorous, since this message was not designed for that (and the message is not documented at all, I think). As a Frama-C developer with knowledge of value analysis internals, I am tempted to use 300 as a safe bound, but in order to obtain a rigorous number of states, anyone can insert the following statement in the while loop of the program:

Frama_C_show_each_state( a1, a4, a0, a15, a29, a10, a16, a22, a2, a17, a25, a7, a14, a19, a20, a8, a23, a21, a24, a13, a9, a28, a26, a6, a27, a12, a3, a11, a5, a18 );

Once filtered, sorted and de-duplicated, the list of states is exactly 31 long. Once de-duplicated, the messages printed by the call correspond exactly to unique states of the program. This method provides an exact number and it uses documented value analysis features only. The list is provided at the end of the post for the doubters.

Looking for an execution with more than 31 transitions

Knowing that we are looking for an execution where counter goes higher than 31, we can now launch the analysis of the version instrumented with counter:

$ frama-c -val -slevel 100000000 -no-val-show-progress Problem3_terminate_at_Y_with_counter.c
...
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
Problem3_terminate_at_Y_with_counter.c:5:[value] Assertion got status invalid (stopping propagation).
[value] Called Frama_C_show_each_counter({2})
...
[value] Called Frama_C_show_each_counter({31})
...
[value] Called Frama_C_show_each_counter({31})
[value] Called Frama_C_show_each_counter({32})
...
[value] Called Frama_C_show_each_counter({32})
^C

This means that there is at least one infinite execution in the instrumented program, which means that the original program can execute infinitely without emitting Y.

A method not involving a transition counter

An irritating feature of the previously described method is that it does not immediately detect infinite executions that are right under its nose. There could exist, for a certain input, a transition from the initial state to itself, and the analyzer would still have to explore all other transitions, breadth-first, down to a depth of 31, before displaying the Called Frama_C_show_each_counter({32}) message that confirms the existence of an infinite execution.

A method to look for this cycle of length one would be to instrument the program differently. This is what I describe now.

Add variables to the program for the description of the predecessor state

This means transforming the declarations

	int a1 = 1;
	int a4 = 1;
...

into:

	int a1 = 1, p1 = -1;
	int a4 = 1, p4 = -1;
...

This can be done with a regexp.

Check whether a newly reached state coincides with its predecessor state

At the end of a cycle, we compare the new state with the predecessor state. Again, code doing this can be generated with a regexp:

        if 
        ((a1 == p1) &&
        (a4 == p4) &&
...
        (a18 == p18))
          Frama_C_show_each_cycle_found(1);

The above causes a message to be emitted by the analyzer as soon as a cycle is found.

Update the predecessor state

After checking whether the new state coincides with the predecessor state, it is time to update the predecessor state. Yada regexp yada:

        p1 = a1;
        p4 = a4 ;
...
        p18 = a18 ;

And presto: we detect all cycles. All cycles of length one, that is.

Maybe update the predecessor state

Although it would work great on the example that motivated this new method, detecting only cycles of length one is a little bit too specific. Let us fix that.

At the end of a cycle, instead of updating the predecessor state as above, let us maybe update it as below:

    if (any_int())
    {
            p1 = a1;
            p4 = a4 ;
...
            p18 = a18 ;
    }

With the above non-deterministic update of the predecessor state, for each vector a1, a4, ...a18 of variables propagated by the value analysis, the variables p1, p4, ...p18 contain a nondeterministic superposition of all its predecessors. Think of it as programming a quantum computer, and if you think of it this way, remember that the variables p1, p4, ...p18 are entangled: when observed, they contain matching values from one specific predecessor. They are entangled because the conditional above either update variables p1, p4, ... all at once or leaves them all unchanged.


On the other hand, different predecessors do not get mixed together because... well, because the analyzer remains complete on this program even after it being instrumented thus, for reasons I still do not want to document. Nevertheless, the important part of the argument is that a sound and complete analyzer would find the Frama_C_show_each_cycle_found(1); statement above reachable if and only if in the program instrumented to stop at Y, a state is identical to one of its predecessors. Hopefully it is possible to convince oneself of that much without being a value analysis expert—although I admit that programming nondeterministic machines is an unusual exercise.


Using Frama-C's value analysis as a sound and complete abstract interpreter, the statement Frama_C_show_each_cycle_found(1); will be reached, and a message be displayed, exactly when a state is reached that is identical to one of its predecessors. This happens if and only if there is a cycle in the transition graph of the program instrumented to stop at Y. In terms of the original program, this corresponds to (F oY) being false.

Putting the new method to the test

Running the value analysis on the program instrumented as described:

$ frama-c -val -slevel 100000000 -no-val-show-progress Problem3_terminate_at_Y_with_predecessor.c 
...
[value] Semantic level unrolling superposing up to 100 states
[value] Semantic level unrolling superposing up to 200 states
[value] Called Frama_C_show_each_cycle_found({1})
^C

Incidentally, on this example, this new method is much faster than the method involving a counter. It is not clear that it should always be the case, and on difficult examples, the two methods can perhaps be tried in parallel.

Annex: an exhaustive list of states for Problem 3

Below is the list of unique states displayed when the statement

Frama_C_show_each_state( a1, a4, a0, a15, a29, a10, a16, a22, a2, a17, a25, a7, a14, a19, a20, a8, a23, a21, a24, a13, a9, a28, a26, a6, a27, a12, a3, a11, a5, a18 );

is inserted in the while loop of Program3.c.

[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{0},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{0},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{0},{1},{0},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{1},{0},{0},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{1},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{1},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{0},{1},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{0},{1},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{1},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{1},{0},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{1},{1},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{0},{0},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{1},{0},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{1},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{0},{0},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{1},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{1},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{0},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{1},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{0},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{1},{0},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{1},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{1},{2},{0},{1},{1})


This post benefited from discussions with Sébastien Bardin, Sven Mattsen, Benjamin Monate and Boris Yakobowski.

Tuesday, August 21 2012

Participating in the RERS 2012 competition: liveness questions

This third installment discusses the diagnosis of liveness properties of particular C programs, using Frama-C's value analysis, for the RERS 2012 competition.


[Note added in September 2012: the diagnostic found for the example in this post disagrees with the diagnostic found later and sent to the competition organizers. I am unable to reproduce the result obtained here. Perhaps it was obtain with a version of Frama-C that had a bug.]

Liveness properties 101

Here is another extract of two properties found in the property list attached to Problem 1 in the competition:

(G ! oW)
output W does never occur
----------------------------------------

(F oW)
output W occurs eventually

To tell you how unaccustomed to temporal properties I am, I first read the second formula as a negation of the first one. There are redundancies in the lists of formulas to evaluate, and I thought this was one more of them. It is of course not the case: (F oW) is not the negation of (G ! oW).

(G ! oW) says that there is no execution path that leads to W being output; the negation of that is that there is at least one such execution path. By contrast, (F oW) expresses that all execution paths have a W being output at some point, a much stronger property.


Indeed, (F oW), expressing that a state in which W is output is eventually reached, is a typical liveness property. Checking this kind of property is at odds with traditional abstract interpretation practices, where the analysis is considered done when a fixpoint has been reached: the fixpoint does not tell whether all executions eventually leave the loop. It only tells what is guaranteed for those executions that leave the loop.

Semi-solution: a simple program transformation

Please bear with me and consider the simple modification of Problem1.c below, applied on top of the previous instrumentation:

--- Problem1.c	2012-08-08 10:15:00.000000000 +0200
+++ Problem1_terminate_at_W.c	2012-08-21 17:00:38.000000000 +0200
@@ -1,9 +1,8 @@
 void assert(int x)
 {
   if (!x)
     {
-      Frama_C_show_each_assert_reached();
       /*@ assert \false; */
     }
 }
 
@@ -594,6 +593,8 @@
 
         // operate eca engine
         output = calculate_output(input);
 
+	// continue only if output not W yet
+	/*@ assert output != w ; */
     }
 }

We remove the user message in our assert() function because we want to temporarily forget about assertions. Besides, by instrumenting the main loop, we make the program stop when the output is W.

You should be able to convince yourself that the modified program terminates if and only if the original program does not admit any infinite execution that never outputs W (for any infinite sequence of values assigned to variable input).


Earlier posts (1, 2) in this blog pointed out how Frama-C's value analysis could be made to forget that it was an abstract interpreter, and verify the termination of programs.

Application

Since everything is ready for this particular property of Problem1.c, let us run the verification. The command to prove that the program terminates is as follows:

$ frama-c -val -obviously-terminates -no-val-show-progress Problem1_terminate_at_W.c
...
[value] Semantic level unrolling superposing up to 400 states
[value] done for function main
[value] ====== VALUES COMPUTED ======

We should have used a timeout, because the analysis fails to terminate when the analyzer is unable to prove that the target program terminate. Nevertheless, we were lucky, the target program terminates, the analyzer is able to verify that the program terminates, and we can conclude that there are no infinite executions of the original Program1.c that fail to emit W.

Assertions and consequences on liveness properties

The previous section only tells us that there is no infinite execution that does not emit W. Depending on the organizers' intentions, this may or may not answer the question of whether (F oW) holds.

It still depends whether you count an execution that reaches one of the assert(0); in the program as, well, an execution. If it count as an execution to which the formula should apply, then an execution that does not emit W and terminates on such an assert(0); certainly does not eventually output W.

There is an argument to say that it shouldn't count, however: although it is impossible to understand what these programs are supposed to do, assertions usually characterize illegal behaviors. Perhaps the LTL formulas are to be evaluated only on the legal, infinite behaviors of the system?

In the latter case, we are done: (F oW) holds because W is eventually emitted in all legal executions. In the former case, the additional question is whether an original assert(0); is reached before W is output, which is a simple reachability question on the instrumented Problem1_terminate_at_W.c. The answer is then that (F oW) does not hold, because there are plenty of assert(0); calls that are reached without W having been emitted.


We e-mailed the organizers and they said it was the first, simpler case. Only infinite executions need be taken into account for evaluating LTL formulas, and interrupted executions can be ignored. Our answer for the formula (F oW) of Problem 1 in the competition is therefore that the property holds.

Proving that something does not eventually happen

In the first part of the post, we set out to prove that the liveness property (F oW) was true. Verifying that the property is false requires a different approach. With the toolset provided by Frama-C, I only see the scheme below. We reduce the problem to the seemingly harder but actually easier problem of computing the maximum number of iterations before W is emitted.

  1. Obtain the number N of reachable states for the system by running the value analysis on the original program.
  2. Instrument the program, as previously making execution terminate when W is emitted, and also maintaining a transition counter (incremented at each iteration of the loop).
  3. Using sound and complete propagation on the instrumented program, either reach a fixpoint (with a finite bound computed for the counter) or find an execution with N+1 transitions that does not emit W. The former case means that the program eventually emits W. The latter that, in the transition graph, there is a reachable cycle along which W is not emitted, and therefore an infinite execution that does not output W.


This technique likely gets pretty heavy in practice. Adding a counter that may go up to N+1 to a system that already has N states can probably cause the new system to have of the order of N² states. This segues into at least one, perhaps two more blog posts, to answer the questions “what is the value analysis actually doing when it is applied on these questions?”, “how much work would it be to do the same thing closer to the metal, executing the C function as compiled instead of laboriously interpreting it?”, and “why don't we make a C model checker while we are at it?”

Monday, August 20 2012

Participating in the RERS 2012 competition: reachability questions

This post is a follow-up to this one. It begins to describe our participation, using Frama-C's value analysis, in the RERS 2012 competition.

In theory

The competition features nine target programs of increasing difficulty. Some of the questions to be answered about the target programs are reachability questions. This is the case for all calls to assert(0); in the program: for each call, the question is exactly whether it can be reached in an execution.

In addition, some LTL formulas are offered about each target program. The goal here is to individually diagnose each LTL formula as true or false when applied to the program. Some of the LTL formulas actually express reachability properties. Consider the following property, from the list that applies to problem 1 (small-easy):

(G ! oU)
output U does never occur

This formula holds if a program state in which U is output is never reached. It suffices to instrument the program a little bit to recognize when this happens, and the problem is reduced to a reachability problem, only in reverse: the property holds if the states are unreachable, and does not hold if one of the states can be reached.

In practice

The good news is that the value analysis is, at its core, designed to answer reachability questions. More specifically, it is designed to answer unreachability questions: it computes supersets of the values of variables that can reach each point in the target program, and it warns if dangerous instructions can be reached. It can guarantee that an assert(0); call is not reached.


Preparing the program for analysis as described in the manual and many times on this blog may result for instance in the following changes: removing the headers, defining a function assert(), removing the printf() calls so that one does not need to provide the function.

--- Problem1.c~	2012-07-31 01:32:42.000000000 +0200
+++ Problem1.c	2012-08-08 10:15:00.000000000 +0200
@@ -1,5 +1,12 @@
-#include <stdio.h> 
-#include <assert.h>
+void assert(int x)
+{
+  if (!x)
+    {
+      Frama_C_show_each_assert_reached();
+      /*@ assert \false; */
+    }
+}
+
 
 	// inputs
 	int a= 1;
@@ -582,14 +589,11 @@
     {
         // read input
         int input;
-        scanf("%d", &input);        
+	input = any_int();
+/*@ assert input == 1 || input == 2 || input == 3 || input == 4 || input == 5 || input == 6 ; */
 
         // operate eca engine
         output = calculate_output(input);
 
-        if(output == -2)
-        	fprintf(stderr, "Invalid input: %d\n", input);
-        else if(output != -1)
-			printf("%d\n", output);
     }
 }

We assume that an execution that reaches an assert(0); is interrupted, and that input at each cycle is a number between 1 and 6. This corresponds to our understanding of the description of the programs in the competition.


As previously hinted, the value analysis is designed to answer reachability questions negatively only. It is intended to guarantee that dangerous operations are unreachable (i.e. the program is safe). In doing so, it computes whether any statement is reachable, even non-dangerous ones, and with what variable values each of these statement is reached.

In static analysis, this is called “soundness”. The value analysis does not in general guarantee that any statement is reachable. However, for the particular program above, instrumented in this particular way, with the right command-line options, the value analysis is “complete” in addition to being sound: in addition to only diagnosing as unreachable statements that are actually unreachable, it only diagnoses as reachable statements that are actually reachable.

Being complete in addition to being sound means that the value analysis is able to answer the question (G ! oU) negatively as well as positively. If the value analysis was only sound, it could only guarantee that a particular assert(0); call is not reached, or that U is never output. Being complete means that it can in addition guarantee that an assert(0); call is reached in some execution or that U is output in some execution.

More formulas to evaluate

There are more complex LTL formulas in the list that express reachability properties, sometimes modulo trivial instrumentation. Below are two of them:

(! oZ WU (oZ WU (! oZ WU (oZ WU (G ! oZ)))))
output Z occurs at most twice
----------------------------------------

(! oU WU oZ)
output U does never occur before output Z

The second formula expresses that a state where U is emitted without Z ever having been emitted is not reached. To verify this property, one simply needs to add to the target program a boolean variable that remembers whether Z has been emitted before.

The first is a complex LTL formula with a simple English interpretation. The program can be instrumented with a counter for the number of times Z is output, and the property to check is whether a state where Z has been emitted three times is reached.

Do not make completeness claims about Frama-C's value analysis at home

You may have noticed that the value analysis' manual rarely offers precision guarantees, and in particular never describes the conditions in which it is complete.

Therefore, an independent participant in the competition who wasn't a value analysis developer would not be able to assert that the value analysis is behaving completely. Such a participant could only hope to answer reachability questions negatively, finding that (G ! oU) is true (when it is actually true) or finding that a particular assertion is never reached (when it is actually unreachable). This is the direction that matters when trying to establish that error states are unreachable, to verify the safety of software. The competition organizers implicitly acknowledge that this kind of property can be difficult to establish otherwise when they say, about weighting answers:

The weighting scheme gives you a way to express your personal confidence in your results. e.g. if you have found a feasable path to some error, this is safe, and you should weight it 10. Lifeness properties or stating that certain errors are non-feasible is of course more risky.

Here we are using a sound analyzer to establish that certain errors are non-feasible, so in our case there is little risk involved in attributing the maximum weight to these answers. There is in fact more risk when answering that some execution path leads to some error since the value analysis, even when it happens to be complete, is not designed to produce a trace that we could double-check. But we are confident for these errors too because of all the Csmith testing that we applied earlier in the year.


And by the way, this blog post does not count as documentation: it is still undocumented in what circumstances Frama-C's value analysis is complete. The documentation is long enough trying to describe how to use the value analysis soundly; and this is despite soundness being its defining property. There is no point in enumerating the very specific circumstances in which it is complete. The conditions are unlikely to occur simultaneously in real life, and if you think that this competition contradicts this statement, have another look at the shape of the competition's target programs.

- page 2 of 8 -