RERS 2012 competition: our solutions for problems 1-9
By pascal on Wednesday, October 10 2012, 08:15 - Permalink
Previously on this blog
Although it was so brief that you may have missed it, I previously mentioned here the 2012 RERS Grey Box Challenge, an interesting competition where the questions involve programs in C syntax.
I pointed out that some questions were about the reachability of assertions in the programs. As it is documented, Frama-C's value analysis can guarantee that some assertions are not reachable. When its authors are using it, they can ascertain that no approximation is taking place, and that for this reason, any assertion that the analysis diagnoses as perhaps reachable is indeed reachable for some sequence of inputs. Because of the way it work, the analysis is unable to exhibit the sequence of inputs, but one such sequence definitely exist.
We then moved on to properties expressed as LTL formulas. Some of the LTL formulas in the competition expressed reachability properties. It was easy to diagnose those by applying Frama-C's value analysis to an instrumented program. The other properties were more difficult liveness properties. With more instrumentation, involving either a transition counter or a non-deterministic predecessor state, it was possible to diagnose these too using the value analysis.
Colleagues and I have obtained solutions for all properties in the initial problems 1-9 of the competition. More information about these solutions is available in the last part of this post. First, now that the exercise is done, I would like to revisit its benefits.
Features gained, bugs fixed
New built-in Frama_C_interval_split()
A key part of the
main() function used for the verification of any kind of property in the competition was:
input = unknown_int(); /*@ assert input == 1 || input == 2 || input == 3 || input == 4 || input == 5 || input == 6 ; */
These three lines are intended to replace the following one in the original
main() function provided by the organizers. Their line works for interactive sessions where the operator inputs number in the range 1..6.
The call to
unknown_int() could also have been
Frama_C_interval(1, 6). Then the value analysis would be able to tell that the assertion is not an assumption, only a hint. Regardless, the assertion is there, together with option
-slevel, to make the analyzer study separately what happens in executions where a different value is read for
This is a classic trick used in many earlier studies. It is mentioned in the study of the Skein-256 hash function and in nearly all others since. Still, these assertions do get tiring to write when many cases need to be listed. Until now, there was no shortcut (I have been using Emacs macros myself).
Frama_C_interval_split(l, u). It does the same thing that
Frama_C_interval(l, u) does, but it automatically causes the individual values between
u inclusive to be propagated separately, without need for a laborious ACSL assertion.
Frama_C_interval_split() built-in is available in Oxygen (I think), and since it is a true value analysis built-in, the user does not need to remember to include file
builtin.c to make it accessible. A simple prototype
int Frama_C_interval_split(int, int); will do.
The informational message “unrolling up to ... states” could be off
If you have used option
-slevel (and as previously said, if you use Frama-C's value analysis at all, you should definitely use this option from time to time), you know that it sometimes logs a message to tell how much superposition has been done on a same program statement. It looks like:
... [value] Semantic level unrolling superposing up to 100 states ... [value] Semantic level unrolling superposing up to 200 states ...
The algorithm that takes care of this message was like this:
target := 100; if new number of states attached to the statement ≥ target then print "Semantic level unrolling superposing up to " target " states" target := target + 100 endif
The intention was that the value displayed would indicate the maximum number of superposed states to the nearest hundred.
Suppose the first wave of propagated states to reach a statement contains a thousand of them. The algorithm displays “superposing up to 100 states”, and updates the target to 200. If the second wave contains 2500 more states, the algorithm then displays “superposing up to 200 states” and updates the target to 300. And so on. If states keep arriving in large numbers, variable
target gets irremediably behind.
This had not been noticed until this competition, but in model-checking mode, manipulating tons of states,
target never gets a chance to catch up and the number displayed can be much lower than the number of states actually superposed.
This bug is present in Frama_C Oxygen. It is fixed in the development version.
An unnamed option was tested more thoroughly
Last but proverbially not least, thanks to this competition, the value analysis option that caches previous analyses of a function call in order to re-use them without loss of precision has received more testing. Who knows, someday we may even trust it enough to reveal its name.
The next section describes our solutions. If you did not at least attempt the competition yourself, it is unlikely that you will find it interesting: you should stop reading now. Here is my personal conclusion before you go: case studies are fun, and model-checking is a fun sub-field of computer science. At the same time, a piece of software is not an automaton: if you try to specify one as if it was the other you will certainly make mistakes and verify something other than you intended. I think that our experience trying to participate in the competition demonstrates that. If you are not convinced, please try to answer the competition's questions for yourself, and then, and only then, read the next section.
Our solutions to the 2012 RERS challenge
The reachability of assertions is straightforward to answer with the value analysis, so I won't describe that. Here is an archive containing our answers for the LTL properties part. The archive contains mostly everything. Many files there were generated from others. In particular, the *.log files are analysis log files that took about one core-month to generate on a 2.66 GHz Intel Xeon X5650 workstation. The last file to be generated was “results”. It contains the predicted statuses of properties as interpreted with the conventions that follow.
A specific LTL property for a specific problem is checked by putting together the problem's transition function (function
calculate_output() and its callees), a file
buchimgt.c specific to the problem (this file contains the definition of the predecessor state and functions that manipulate it), a file
main.c that contains a generic main loop, and a file such as
output Z occurs after output U until output W.c that is specific to a property (but independent of the problem to which it is linked).
The problem-specific files are generated by the script
The property-specific files such as
output Z occurs after output U until output W.c were generated from templates such as
output _ occurs after output _ until output _.c. The templates were hand-crafted by my colleague Sébastien Bardin to work together with the rest of the instrumentation.
The current status of the LTL property is maintained by an automaton designed for this kind of work (called a Büchi automaton).
main() function describes our interpretation of the problem. In pseudo-code:
forever read input 1..6 compute output from input and current state compute transition in Büchi automaton from input and output if the current trace can no longer fail to satisfy the property prune if the current trace can no longer satisfy the property display current state for verification abort analysis non-deterministically update predecessor state
The actual C file is designed so that the value analysis will remain complete as well as sound when analyzing it with high
-slevel. A secondary objective is for the analysis to be as fast as possible despite current limitations. Readability only ranks third as an objective.
Still, regardless of the strange structure of the program, it is basically C. Any C programmer can read the
main() and understand what is being verified. The only non-standard function is
Frama_C_interval_split(). The informal description of
Frama_C_interval_split() is that the function returns a value between the bounds passed to it and that the results provided by the analyzer hold for all values that it can return.
If a simple Büchi automaton is plugged in that fails when output
Z is emitted, it is relatively easy to understand on the basis of this
main() function what we mean by “
Z is emitted” and that
Z can never be emitted if the analysis passes without the automaton raising a flag. This is how we can verify, as an example, the LTL property
As you can see on the pseudo-code, the Büchi automaton is updated with an input and the corresponding computed output, which are considered simultaneous. When
calculate_output() fails to return an output because of a failed assertion, no pair
(input, output) is presented to the automaton. To illustrate, consider the hypothetical property "output X occurs before
input A". This property is weird, because each input A-F can happen at each instant, including the first one. It may seem that the property is tautologically false.
With our interpretation, this property would be true if the system always failed the instant it read A until it had output X. These inputs would not count because the Büchi automaton would not see them.
The above is what we considered the most reasonable interpretation of what the competition organizers intended, at the time we sent our answers. On October 1, the organizers published example solutions that made us realize that they intended something else, and we had to retract our participation.
Differences with the official interpretation, and remaining unknowns
As the challenge period was coming to a close, organizers published solutions to additional problems 10 and 11 (the challenge was supposed to close at the end of September. The solutions illustrating the semantics were published on October 1 and the deadline extended).
One can only assume that these solutions also define the semantics of LTL formulas for problems 1-9, since no example is available for these. Unfortunately, there are subtle differences between the two families of programs.
Outputs corresponding to invalid inputs
The transition functions in problems 1-9 can return -1 or -2. They only return -2 when the rest of the program state was unchanged. The organizers communicated to us that traces in which -2 is emitted indefinitely should be ignored:
The same holds by the way for invalid inputs, i.e., calculate_output returning 2 [sic]. They are not meant to be part of the traces that LTL properties talk about - a liveness property such as F oW hence does not become false alone by infinitely often executing an invalid input.
I did not see this last point made explicit on the website.
For problems 1-9, since the program state is unchanged when -2 is returned, it seems that the easiest way to comply is to treat -2 as a fatal error that interrupts the trace. We do not lose much of importance in doing so because any trace with -2 outputs can be shortened into a trace without these outputs. We do lose some inputs during such a shortening, so it is still slightly invasive and we would like to be sure that this is what the organizers intended. Indeed, the
main() function provided by the organizers does seem to treat -1 and -2 differently, and does call the inputs that lead to -2 “invalid”.
if(output == -2) fprintf(stderr, "Invalid input: %d\n", input); else if(output != -1) printf("%d\n", output);
The code above matches more or less the “if an unexpected input event is provided, an error message is printed and the ECA system terminates” statement on the competition's website. The code above does not stop on -2 but it seems clear that the “unexpected input” from the explanation is the -2 from the code and that, as informally stated, it interrupts the current trace.
You might expect programs 10-13 to follow the same pattern, but it turns out that instead of sometimes -1 and sometimes -2, they always return -2, both when the program state has been changed by the transition function and when it hasn't. You might still hope to interpret -2 as “error”, but in programs 10 and 11 it happens in the middle of traces that are given as example for the questions on the reachability of assertions. The solution page says:
error_46 reachable via input sequence [E, A]
And, when you try it:
$ gcc Problem10.c $ ./a.out 5 Invalid input: 5 1 Assertion failed: (0), function calculate_output, file Problem10.c, line 66. Abort trap
So it does not look that -2 should be treated as an error after all. In the above organizer-provided trace, the error caused by input 5 (E) does not make
error_46 unreachable (actually, I have it on good authority that error_46 is unreachable when, in the transition function,
return -2; is treated as a trace-interrupting error).
In conclusion, for problems 1-9, we treated -1 as just another output letter and -2 as a trace-interrupting error. This does not correspond to what the organizers did in their example solutions to problem 10. We still do not know what the organizers intended. The transition function returning -2 instead of -1 in problems 10-13 seems particularly inscrutable when the organizers have previously told us by e-mail to ignore the traces with finitely many non-(-2) outputs.
Our aforelinked solutions were computed with this convention. We provide them in case someone wants to use the same convention and compare their results to ours.
Are inputs and outputs simultaneous or are they different events?
Again, we have no idea. In our modelization, the output happens at the same time as the input it was computed from.
But in example traces provided by the organizers, it looks like they are distinct events!! An example trace is
[iD, oV, iB] ([oY, iC, oV, iB])*.
But if input and output are separate events, why is the first LTL formula for problem 10 expressed as
(! iA WU (oU & ! iA)) ?
(oU & ! iA) makes no sense, because if the event is any sort of output event (
U or another), then it cannot be an input event. The atom
(oU & ! iA) is just equivalent to
oU. All LTL properties that talk about inputs and outputs in the same instant are similarly nonsensical.
In our solutions, influenced by the shape of the competition's LTL formulas which talk about input and output in the same instant,
we considered that input and corresponding output are simultaneous. Both are passed to the Büchi automaton at the time of computing a transition. An input can happen together with the output -1. If the transition function aborts because of an assertion or returns -2, then the input is not seen by the Büchi automaton (in the case of the -2 output, this would be easy to change by modifying the
main() function. The trace could be interrupted just after the Büchi automaton transition instead of just before).
Is there any reachability LTL property at all in the competition?
The example solutions all provide infinite traces as counter-examples. This is consistent with the explanation “an LTL property holds for a certain ECA program if and only if all possible runs which do not lead to an error satisfy the LTL property”, but it is a very strange way to approach reachability questions (which indeed are no longer reachability questions at all). To answer a LTL question that would ordinarily have been a reachability question, such as “output
W occurs at most twice”, in this setting, one must, when
W has been encountered three times on a trace, wonder whether this trace is the prefix of at least one infinite trace. If there is a sequence of inputs that keeps the trace alive, then the trace is a counter-example. If all inputs that can be added to extend to this prefix eventually lead to an error, even if it is at a very far horizon, then the trace is not a counter-example (remember not to take into account the trace that indefinitely outputs -2 when trying to answer this sub-question).
In short, in this setting, no LTL formula ever expresses a reachability property. It does not prevent us to handle these LTL properties (Sébastien assures me that our method theoretically handles all liveness properties. We certainly handle those that are already liveness properties without the weird “all possible runs that do not lead to an error” quantification).
This discussion gives me an idea for a powerful static analysis tool. I will call it Marvin. You ask it “Is this software safe? Can this dangerous angle be reached?” and it answers “Who cares? The Sun is going to blow up in five billion years. Good riddance if you ask me. Oh god, I'm so depressed…”
As the initiator of this attempt to participate to the competition, I would like to thank my colleagues Virgile Prevosto, Boris Yakobowski and Sébastien Bardin, from whom I personally learnt a lot about various aspects of software verification; my warmest thanks are for the organizers of the 2012 RERS Grey Box Challenge.