Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - benchmarks

Entries feed - Comments feed

Friday, November 2 2012

Short, difficult programs

When researchers start claiming that they have a sound and complete analyzer for predicting whether a program statement is reachable, it is time to build a database of interesting programs.

Goldbach's conjecture

My long-time favorite is a C program that verifies Goldbach's conjecture (actual program left as an exercise to the reader).

If the conjecture is true, the program never terminates (ergo a statement placed after it is unreachable). If the conjecture is false, the program terminates and moves on to the following statement. The program can be implemented using unsigned long long integers that should be good for counter-examples up to 18446744073709551615. It can alternately use dynamic allocation and multi-precision integers, in which case, depending whether your precise definition of “analyzing a C program” includes out-of-memory behaviors, you could claim that the reachability of the statement after the counter-example-finding loop is equivalent to the resolution of “one of the oldest and best-known unsolved problems in number theory and in all of mathematics”.

Easier programs than that

No-one expects the resolution of Goldbach's conjecture to come from a program analyzer. This example is too good, because it is out of reach for everyone—it has eluded our best mathematicians for centuries. What I am looking for in this post is easier programs, where the solution is just in reach. Examples that it would genuinely be informative to run these sound, complete analyzers on. If they were made available.

With integers

For 32-bit ints and 64-bit long longs, I know that the label L is unreachable in the program below, but does your average sound and complete program analyzer do?

/*@ requires 2 <= x ;
    requires 2 <= y ; */
void f(unsigned int x, unsigned int y)
{
  if (x * (unsigned long long)y == 17)
    L: return;
}

The key is that with the aforementioned platform hypotheses, the multiplication does not overflow. Label L being reached would mean that the program has identified divisors of the prime number 17, which we don't expect it to.


In the program below, the multiplication can overflow, and not having tried it, I have truly no idea whether the label L is reachable. I expect it is, statistically, but if you told me that it is unreachable because of a deep mathematical property of computations modulo a power of two, I would not be shocked.

/*@ requires 2 <= x ;
    requires 2 <= y ; */
void f(unsigned long long x, unsigned long long y)
{
  if (x * y == 17)
    L: return;
}

With floating-point

Floating-point is fun. Label L in the following program is unreachable:

/*@ requires 10000000. <= x <= 200000000. ;
    requires 10000000. <= y <= 200000000. ; */
void sub(float x, float y)
{
  if (x - y == 0.09375f)
    L: return;
}

Frama-C's value analysis, and most other analyzers, will not tell you that label L is unreachable. It definitely looks reachable, The difference between x and y can be zero, and it can be 1.0. It looks like it could be 0.09375 but it cannot: the subtracted numbers are too large for the difference, if non-zero, to be smaller than 1.0.


So the subtlety in the example above is the magnitude of the arguments. What about smaller arguments then?

/*@ requires 1.0 <= x <= 200000000. ;
    requires 1.0 <= y <= 200000000. ; */
void sub(float x, float y)
{
  if (x - y == 0.09375f)
    L: return;
}

This time the label L is easily reachable, for instance with the inputs 2.09375 for x and 2.0 for y.


What about this third program?

/*@ requires 1.0 <= x <= 200000000. ;
    requires 1.0 <= y <= 200000000. ; */
void sub(float x, float y)
{
  if (x - y == 0.1f)
    L: return;
}

The target difference 0.1f is larger than 0.09375f, so it should be easier to obtain as the difference of two floats from the prescribed range, right? In fact, it isn't. The numbers 0.099999904632568359375 and 0.10000002384185791015625 can be obtained as values of x-y in the above program, for the former as the result of picking 1.099999904632568359375 for x and 1.0 for y. The value 0.1f, on the other hand, cannot be obtained as the subtraction of two floats above 1.0, because some bits are set in its significand that cannot be set by subtracting floats that large.

Conclusion

I expect it will be some time before automatic analyzers can soundly and completely decide, in all two-line programs, whether the second line is reachable. Frama-C's value analysis does not detect unreachability in any of the programs above, for instance (the value analysis is sound, so it soundly detects that L may be reachable when it is). Please leave your own difficult two-line programs in the comments.


The floating-point examples in this post owe quite a bit to my colleague Bruno Marre explaining his article “Improving the Floating Point Addition and Subtraction Constraints” to me over lunch.

Tuesday, October 16 2012

Exact Gap Computation for Code Coverage Metrics in ISO-C

Comparing static analysis tools is (still) difficult

Earlier this year of 2012, some of my colleagues and I took the opportunity to point out that, as a research community, we are not doing a satisfactory job of comparing static analysis tools. This article and blog post were concerned with independent third-party comparisons. Another context in which each of us can be lead to do comparisons, and to try to be as objective as possible, is the “related work” section of any traditional, non-benchmark article.

“Related work” sections

Dirk Richter and Christian Berg have designed a technique that, apparently, it makes some sense to compare to what Frama-C's value analysis allows. Here is their summary of their comparison between their tool and the value analysis:

No sophisticated examples can be handled by Frama-C’s value analysis. Some of the examples tested even cause runtime-errors in Frama-C itself, thus it is not reliable.

Richter and Berg's technique appears to tackle the problem of estimating how much code must actually be covered for all the code that can be covered to have been covered.

Frankly, it seems to me that the researchers should be comparing their tool to existing test-generation tools. They are much closer in objectives, and in the means deployed to reach these objectives. One such tool that I know of because it is developped locally is PathCrawler. I am sure there are plenty of others. I am not sure why they spend a paragraph on Frama-C's value analysis when it is absolutely not documented as trying to do this.

Possible comparisons

However, I am interested in tool comparisons, so this is a good opportunity to understand how these comparisons work and how we can make them better.


First, the claim “No sophisticated examples can be handled by Frama-C’s value analysis” is interesting, because of course what really kills advanced techniques in practice is that they do not scale to even moderate size programs. The authors point out that their own work may only be “practical” for embedded systems. Coincidentally, Frama-C's value analysis was designed to work well on embedded code, too, since we too felt this was a domain where practicality was within reach. This makes me curious as to what embedded code examples the authors experimented on before reaching their conclusion. If these examples reveal weaknesses in Frama-C's value analysis on embedded code, we would definitely like to hear about them. I would summarize this into one additional item that we did not list in our “Benchmarking static analyzers” article (although it was implict in other recommendations we did list): when doing comparisons, conditions should be made explicit and input programs should be made available. Here, this would mean providing the Frama-C version and commandline in addition to the test programs themselves. For such programs and commandline that “cause runtime-errors in Frama-C itself”, a good place to provide them is our Bug Tracking System. Instructions for reporting a bug are displayed on crashes. Otherwise, a tarball with all the embedded C code examples the authors used would be fine. A URL could be provided in the article.


Second, the authors claim that Frama-C is not reliable. This makes me want to investigate because, precisely, in 2012 we published a peer-reviewed article on the subject “Frama-C is reliable, here is how we used random program generation to make it so”. I am eager to get my paws on the authors' implementation so as to evaluate its reliability, since this is supposed to be one aspect where it does better. Unfortunately, the implementation does not appear to be available yet.


Third, it is always a difficult exercise to compare one's own tool to a different, unfamiliar one. One risks ending up with a slightly unfair conclusion, even if one tries one's hardest to be objective. I was remarking about the difficulty of finding “fair” inputs for the comparison of static analyzers in another previous post. The difficulty for static analyzers is that inputs are programs: programming languages are so expressive that two programs, even in the same language, can differ widely.

There, as a remedy, I suggested tool authors first try their own tool on the programs that the other analyzer works well on. For someone working on another technique and unfamiliar with the strengths and weaknesses of Frama-C's value analysis, Csmith-generated examples are one obvious possibility. Another possibility would be to use the programs in the 2012 RERS challenge. These programs contains assert(0); calls and, for each, an objective is to tell whether it is reachable. This seems to me to be the same question as whether a test suite needs to cover the assert(0); call. The good news is that both on Csmith-generated programs and in the RERS challenge, Frama-C's value analysis can be made to be both sound and complete, just like Richter and Berg claim their technique is. It should be simple and informative to compare the results of their tool to the results obtained with the value analysis then. Below are our solutions for problems 1-9.

Reachable assertions in problems 1-9 of the RERS competition

Below are a list of reachable lines for each problem. The assert(0) calls not listed are unreachable.

Problem1.c

error_20: assert(0);
error_47: assert(0);
error_32: assert(0);
error_37: assert(0);
error_56: assert(0);
error_33: assert(0);
error_57: assert(0);
error_50: assert(0);
error_35: assert(0);
error_15: assert(0);
error_38: assert(0);
error_21: assert(0);
error_44: assert(0);
globalError: assert(0);

Problem2.c

error_50: assert(0);
error_45: assert(0);
error_59: assert(0);
globalError: assert(0);
error_43: assert(0);
error_13: assert(0);
error_16: assert(0);
error_44: assert(0);

Problem3.c

error_45: assert(0);
error_35: assert(0);
error_52: assert(0);
error_39: assert(0);
error_9: assert(0);
error_37: assert(0);
error_43: assert(0);
error_31: assert(0);
error_28: assert(0);
error_27: assert(0);
error_50: assert(0);
error_13: assert(0);
error_26: assert(0);
globalError: assert(0);

Problem4.c

error_12: assert(0);
error_19: assert(0);
error_31: assert(0);
error_39: assert(0);
error_52: assert(0);
error_6: assert(0);
error_58: assert(0);
error_40: assert(0);
error_4: assert(0);
error_38: assert(0);
error_45: assert(0);
error_11: assert(0);
error_26: assert(0);
globalError: assert(0);
error_9: assert(0);
error_17: assert(0);
error_32: assert(0);
error_35: assert(0);
error_55: assert(0);
error_36: assert(0);
error_14: assert(0);
error_18: assert(0);
error_13: assert(0);
error_15: assert(0);
error_27: assert(0);

Problem5.c

error_0: assert(0);
error_38: assert(0);
error_57: assert(0);
error_55: assert(0);
error_58: assert(0);
error_32: assert(0);
error_13: assert(0);
error_51: assert(0);
error_33: assert(0);
error_48: assert(0);
error_18: assert(0);
error_39: assert(0);
error_1: assert(0);
error_41: assert(0);
error_37: assert(0);
globalError: assert(0);
error_11: assert(0);
error_26: assert(0);
error_15: assert(0);
error_40: assert(0);
error_36: assert(0);
error_44: assert(0);
error_30: assert(0);
error_47: assert(0);
error_24: assert(0);

Problem6.c

error_12: assert(0);
error_21: assert(0);
error_11: assert(0);
error_44: assert(0);
error_1: assert(0);
error_36: assert(0);
error_0: assert(0);
error_2: assert(0);
error_38: assert(0);
error_48: assert(0);
error_37: assert(0);
error_4: assert(0);
error_59: assert(0);
error_10: assert(0);
error_20: assert(0);
error_5: assert(0);
error_15: assert(0);
error_27: assert(0);
error_33: assert(0);
error_9: assert(0);
error_29: assert(0);
error_47: assert(0);
error_56: assert(0);
error_24: assert(0);
error_58: assert(0);
globalError: assert(0);

Problem7.c

error_58: assert(0);
error_47: assert(0);
error_5: assert(0);
error_48: assert(0);
error_19: assert(0);
error_39: assert(0);
error_36: assert(0);
error_40: assert(0);
error_35: assert(0);
error_31: assert(0);
error_9: assert(0);
error_42: assert(0);
error_7: assert(0);
globalError: assert(0);
error_11: assert(0);
error_20: assert(0);
error_44: assert(0);
error_46: assert(0);
error_18: assert(0);
error_6: assert(0);
error_23: assert(0);
error_30: assert(0);
error_3: assert(0);
error_37: assert(0);
error_15: assert(0);

Problem8.c

error_48: assert(0);
error_2: assert(0);
error_49: assert(0);
error_15: assert(0);
error_7: assert(0);
error_55: assert(0);
error_51: assert(0);
error_50: assert(0);
error_43: assert(0);
error_10: assert(0);
error_29: assert(0);
error_24: assert(0);
error_1: assert(0);
error_26: assert(0);
error_6: assert(0);
error_5: assert(0);
error_46: assert(0);
error_13: assert(0);
error_4: assert(0);
error_37: assert(0);
globalError: assert(0);
error_34: assert(0);
error_25: assert(0);
error_28: assert(0);
error_59: assert(0);

Problem9.c:

error_46: assert(0);
error_57: assert(0);
error_36: assert(0);
error_19: assert(0);
error_6: assert(0);
error_10: assert(0);
error_34: assert(0);
error_15: assert(0);
error_32: assert(0);
error_41: assert(0);
error_11: assert(0);
error_35: assert(0);
error_2: assert(0);
error_20: assert(0);
error_3: assert(0);
globalError: assert(0);
error_44: assert(0);
error_38: assert(0);
error_51: assert(0);
error_54: assert(0);
error_56: assert(0);
error_53: assert(0);
error_47: assert(0);
error_59: assert(0);
error_8: assert(0);

Friday, August 24 2012

On writing a dedicated model-checker for the RERS competition

In recent posts, I have shown that Frama-C's value analysis could answer many reachability questions, and some questions that weren't originally phrased as reachability questions, about the programs in the RERS competition.

If you are vaguely familiar with the internals of Frama-C's value analysis, and if you tried analyzing some of the competition programs yourself, you may have noticed that these analyses use only a small fraction of the plug-in's capabilities. The analyzer is only ever propagating abstract states that correspond to singletons. It does juggle with many program states, but the programs here have small states that are encoded in just a few variables (the analyzer would have been able to manipulate the states encoded on a much larger number of variables and would efficiently share in memory the values common to several of the explored states). There are no bit-fields, no structs with padding (both of which might make identical states look different if carelessly handled). The programs obviously do not execute any undefined behavior for lack of any operation that might introduce them. There is a single outermost loop. There is no arithmetic at all.

In favor of the general verifier

A specialized verifier that was designed for just these programs would have a huge opportunity to do a better job on them. On the other hand, the advantage of working on a more general verifier is that it is useful for more tasks. This enables to spend more time improving it. Some of the improvements enhance the analysis of many programs, including the specific programs built only from assignments, equality tests and conditionals considered in the competition. Some of these improvements are too sophisticated to justify implementing in a verifier that only handles programs with assignments, equality tests and conditionals, because such a verifier will never be usable to find that the SHA-3 candidate Skein's reference implementation does not exhibit undefined behavior, that AES may be susceptible to timing attacks (but Skein isn't), where a couple of bugs are in an old undocumented piece of control software, that there is a subtle memory overflow in compression library QuickLZ, or that units of code have the data and control flows mandated by specification.

What a dedicated verifier might look like

In the particular case of these programs, the same propagation done by the value analysis could be done in C, by a program that would incorporate the transition function directly and execute it as compiled code. This would be much more efficient than what the value analysis does, and in this particular case, it would give the same results. From experiments interpreting Csmith programs, the value analysis slowdown with respect to compiled code can be expected to be of the order of 10000.

Accumulating reachable states

In order to reproduce what Frama-C's value analysis does, a dedicated verifier would need to store states that have already been visited, and to be able to recognize, after applying the transition function once more, whether the obtained state was one that was already visited.

In this particular case, this could be done in constant time with a hashset. Note, however, that it is only possible to compute a hash in this specialized case because all states are “singleton” states. If some states represented several values at once, e.g. a1 ∈ {1; 2} a2 ∈ {0; 1}, the good criterion would then be whether the newly computed state is included in one of the previously propagated states. Testing inclusion in one of the previous states cannot be done in constant time with a hashset (hashsets only allow you to test whether the new state is equal to an existing state, and you need to be careful to use compatible equality and hash functions).

Frama-C's value analysis uses a data structure that is generally as efficient as hashsets when the manipulated states are singletons, and that often remains efficient when testing inclusion in one another of states that are not singletons.

Storing states that remain to be propagated

A small thing: the propagation algorithm also requires a workqueue. Any propagation order will do (Frama-C's value analysis propagation order is not specified either), but since C comes with so few data structures, I just thought I would mention it. For a dedicated verifier in C, one would need to find some linked list implementation or write eir own. Frama-C's value analysis may interpret C slower than the code can be executed once compiled, but it comes with ready to use data structures for the reached set and for the workqueue. One needs not even know they are there to use the analyzer.

Specific drawbacks

Obviously, this ad-hoc verifier could be expected to be must faster than the value analysis. This makes the experiment tempting. What prevents me from starting work on it is the lack of generality of the resulting tool. Some examples follow.

Undefined behavior

Suppose that you had implemented such a specialized verifier for the competition's program, and that you were looking for more transition functions written in C that the same verifier could apply to. You would certainly find some, but would you in general be certain that the transition function never exhibits undefined behavior (not for the first transition, and not for the transition from any reachable state to a successor)? If one isn't certain that the transition function does not cause undefined behavior, from a formal verification point of view, the tool is worthless. An undefined behavior can cause anything to happen. In particular, an out-of-bounds memory access in the transition function can mess up the verifier's reached set or workqueue, and compromise the results of the verification.


Any automatic verifier is susceptible to the caveat that a bug in the verifier can compromise results. This is different: your implementation could be flawless, but it would still be susceptible to a bug in the transition function, a bug in the system being verified.

Of course, you could, as a preliminary step in your verification, check that the transition function does not have any undefined behavior for any input state. If you find that there are a lot of different undefined behaviors in C and that it's a bother to detect them all, we have a tool that we have been working on. It also answers reachability questions.

General interference between unrelated traces

Even if you have only “frank” run-time errors to fear—undefined behavior that compilers kindly translate to run-time errors, at least when optimizations are disabled—, a run-time error in one transition will still interrupt the entire dedicated analyzer. You will notice when this happens, and you can add a guard to the particular division by zero or NULL dereference, but this iterative process may end up taking as long as the value analysis for the same result. The value analysis, when encountering division by zero or NULL dereference, makes a note of it, considers the trace as terminated, and goes on to the propagation of the next trace. The end result, a list of run-time errors to worry about and a reachable states set, is the same, but it is obtained in a single pass.

There is also the issue of non-termination of the transition function. The value analysis detects cases of non-termination more or less quickly; again, when the infinite loop is detected it simply goes on to the next trace. With an ad-hoc verifier, if you expect the verification to take days, it may take days before you realize that the verifier is executing an infinite loop in its third considered transition.

Conclusion

In summary, considering the general C verification framework we already have, it looks like it wouldn't be a good investment of our time to develop a dedicated fast verifier for the competition—although it would provide an insightful datapoint if someone did.

Perhaps participating in the meeting will convince us that Event Condition Action (ECA) systems are more generally useful than our current experience has led us to believe. We could work on a verifier for them if there is a need. There is not much code to reuse from a general-purpose C abstract interpreter. There would be ideas to reuse, certainly. I will come to one of them in the next post.

Thursday, August 23 2012

Technical interlude

The current series of posts is pretty technical, huh? Here is a refreshing diversion in the form of a technical post that also touches on existential questions. What is this blog for? We don't know.

Best solutions, or solutions anyone could have found

If you have been following the current series of posts, you may have noticed that a recurring dilemma is whether the goal is to demonstrate the capabilities of the software tool under consideration (Frama-C's value analysis plug-in) when pushed to its limits by its authors, or to demonstrate what anyone could do without being a creator of the tool, taking advantage of Linux, Mac OS X and sometimes Windows packages for the software, and, most importantly, having only the available documentation to understand what the software makes possible.


The general goal of the blog is to empower users. In fact, much of the aforementioned documentation is available as posts in the blog itself. These posts sometimes describe techniques that were undocumented until then, but describe them so that they wouldn't be undocumented any more. Some of the posts even describe techniques that weren't undocumented at the time of writing. As my colleague Jean-Jacques Lévy would say, “repetition is the essence of teaching”.


However, I do not expect that most software tool competitions are about measuring the availability and learnability of tools. For instance, the SV-COMP competition explicitly forbids anyone other than the authors of the tool to participate with it:

A person (participant) was qualified as competition contributor for a competition candidate if the person was a contributing designer/developer of the submitted competition candidate (witnessed by occurrence of the person’s name on the tool’s project web page, a tool paper, or in the revision logs).

Ostensibly, the tools are the participants in the competition. Involving the authors is only a mean to make the competition as fair as possible for the tools. Some colleagues and I have expressed the opinion that, for static analysis, at this time, this was the right way:

In general, the best way to make sure that a benchmark does not misrepresent the strengths and weaknesses of a tool is to include the toolmakers in the process.


Some competitions on the other hand acknowledge that a participant is a person(s)+tool hybrid, and welcome entries from users that did not create the tools they used. This is perfectly fine, especially for the very interactive tools involved in the cited competition, as long as it is understood that it is not a tool alone, but a hybrid participant, that is being evaluated in such a competition. Some of the participating tools participated in several teams, associated to several users (although the winning teams did tend to be composed of toolmakers using their respective tools).


It is not clear to me where on this axis the RERS competition falls. It is a “free-style” competition. The entries are largely evaluated on results, with much freedom with respect to the way these were obtained. This definitely can also be fun(*).

The competition certainly does not require participating tools to be packaged and documented in such exquisite detail that anyone could have computed the same results. As such, the best description to write is the description of what the authors of the value analysis can do with it with all the shortcuts they trust themselves to take. If some of the questions can be solved efficiently using undocumented features, so be it. It is the free-style style.

But the mission of the blog remains to document, and showing what we have done without explaining how to reproduce it sounds like boasting. I am torn.


(*) An untranslatable French idiom, “bon public”, literally “good audience”, used in an adjective position, conveys a number of subtle, often humorous nuances. I think this applies to me and software competitions. They just all seem fun to me.

A necessary built-in

A value analysis built-in whose usefulness, if it were available, became apparent in the last post was a built-in that halts the analyzer.

Several analysis log snippets in the previous post show a ^C as last line, to indicate that the log already contains the answer to the question being investigated, and that the user watching the analysis go by can freely interrupt it then.

I cheated when editing the post, by the way. I am not fast enough to interrupt the analysis just after the first conclusive log message. But I did interrupt these analyses, which otherwise might have gone on for a long time for no purpose.

I thought I would enjoy my colleagues' horrified reaction so I told everyone about my intention of writing a value analysis built-in, like some already exist for memcpy() or memset(), for halting the analyzer. Not halting the currently considered execution, mind you. There is /*@ assert \false; */ for that, and if you need this functionality available as a C function, you can put /*@ assert \false; */ inside your own function, like I did for assert() earlier when prepping the competition programs for Frama-C.


Clearly, such a built-in does not have a contract, its effects are meta-effects that transcend the framework (in what would, for these colleagues, be the bad sense of the word).

A time-saver

Today I set out to write this built-in, because I started to get serious about the competition and it would be convenient to just instrument the program with:

if ( /* current state identical to predecessor state /* )
{
  Frama_C_show_each_cycle_found(1);
  Frama_C_halt_analysis();
}

instead of having an operator, or even a shell script, watch the log for the Frama_C_show_each_cycle_found message and interrupt Frama-C then.


Soon after opening the file and finding my place, I realized that this feature was already available. I am telling you this because this blog is for the benefit of transparency, for documenting things and for empowering users, but mostly because I want to see my colleagues' horrified reaction when they read about this trick. Actually, I would not use it if I were you. It might cease to work in a future version.

if ( /* current state identical to predecessor state /* )
{
  Frama_C_show_each_cycle_found(1);
  Frama_C_cos(0.0, 0.0);
}

You see, Frama-C value analysis built-ins, such as Frama_C_cos(), do not have an intrinsic type. You can include a header that will give them a type if you wish:

double Frama_C_cos(double x);

Otherwise, the old C89 rules for missing function prototypes apply. During parsing, a type is inferred from the types of the arguments at the call site.

Only if and when the analysis actually reaches the Frama_C_cos(0.0, 0.0) call, the analyzer will realize that it is passed arguments that are incompatible with the OCaml function that implement the built-in. An exception will be raised. Since there is no reason to try to recover from a built-in misuse, this exception is caught only at the highest level. It stops the entire analyzer (it marks the entire analysis results as unreliable, as it should, but of course it does not retroactively unemit the Frama_C_show_each_cycle_found message in the log that says a conclusion has been reached).


The result looks like this, this time without cheating:

[value] Called Frama_C_show_each_cycle_found({1})
...
[value] computing for function any_int <- main.
        Called from Problem6_terminateU_pred.c:9516.
[value] Done for function any_int
[value] Semantic level unrolling superposing up to 1400 states
[value] user error: Invalid argument for Frama_C_cos function
[value] user error: Degeneration occured:
        results are not correct for lines of code
        that can be reached from the degeneration point.
[kernel] Plug-in value aborted because of invalid user input.

This is convenient for timing the analyzer as it tackles some of the harder problems. In the case above, the time was 5m35s answering a liveness question about Problem 6. It would have been a pain to watch the log for that long, fingers poised to interrupt the analyzer.


That's all for this interlude. Expect exciting timings in the next post. There may also be talk of undocumented features that transcend the problem, in the good sense of the word.

Wednesday, August 22 2012

Proving (F oW) true or false: alternative method

In the context of the RERS 2012 competition, the previous post points out that value analysis option -obviously-terminates can be used to prove the termination of an instrumented program, created such that its termination implies that the property (F oW) is true of the original competition program. Unfortunately, this only works in one direction. If the property (F oW) does not hold of the original competition program, this approach cannot be used to prove that. The previous post continued to offer a method, involving a counter, that works in both directions.

Preliminary remark regarding the counter method

The second method proposed in the previous post requires a state count to use as bound. I suggested to use the number of reachable states of the original (uninstrumented) program, obtained by running Frama-C's value analysis as a sound and complete abstract interpreter.

I would like to point out that instead of the state count of the original program, it is possible to use the state count of the program instrumented to stop when W is emitted. This number is by definition lower. On the other hand, the number of states of the uninstrumented program can be computed once and for all, and then be used for all formulas of the form (F oW) that apply to that program in the competition. In fact we can obtain that number for free when we check the reachability of assert(0); statements for the competition. So there is a trade-off there.

Applying the method of last post

Let us now consider Problem3.c (small-hard). One of the properties to evaluate is (F oY). The first method described in the previous post does not conclude rapidly, so there is reason to suspect that perhaps the property is false for this program. But if this is going to be our answer in the competition, we had better establish that the property is definitely false, that is, there exists an infinite execution along which Y is not emitted.

Instrumenting with a counter

To establish that the property is false using a counter, we instrument Problem3.c thus:

--- Problem3.c	2012-08-08 10:49:53.000000000 +0200
+++ Problem3_terminate_at_Y_with_counter.c	2012-08-22 13:58:30.000000000 +0200
@@ -1,9 +1,8 @@
 void assert(int x)
 {
   if (!x)
     {
-      Frama_C_show_each_assert_reached();
       /*@ assert \false; */
     }
 }
 
@@ -1649,15 +1648,16 @@
 	    } 
 	    if((((((((!(a26==1)&&(a6==1))&&!(a27==1))&&(a12==1))&&(a3==0))&&(a11==1))&&(a5==1))&&(a18==1))){
 	    	error_2: assert(0);
 	    } 
-	    return -2; 
+	    /*@ assert \false ; */
 	}
 
 int main()
 {
     // default output
     int output = -1;
+    int counter = 0;
 
     // main i/o-loop
     while(1)
     {
@@ -1668,6 +1668,10 @@
 
         // operate eca engine
         output = calculate_output(input);
 
+	/*@ assert output != y ; */
+
+	counter++;
+	Frama_C_show_each_counter(counter);
     }
 }

Obtaining a bound

In order to obtain a bound, we analyze the original program, using the value analysis as a model-checker:

$ frama-c -val -slevel 100000000 Problem3.c
...
[value] Semantic level unrolling superposing up to 200 states
...

This message means there are up to a number between 200 and 300 states attached to a single statement during the analysis of the program. That statement is probably the statement just after a new input has been picked non-deterministically between 1 and 6, and the number therefore six times higher than the number of states that interests us. The abstract interpreter also distinguishes states program that differ only for variable output, again making the bound displayed in the “superposing up to” messages higher than the number we mean. But let us not try to compensate for that, just to be on the safe side.


I kid, I kid. Using the “superposing up to” message is not rigorous, since this message was not designed for that (and the message is not documented at all, I think). As a Frama-C developer with knowledge of value analysis internals, I am tempted to use 300 as a safe bound, but in order to obtain a rigorous number of states, anyone can insert the following statement in the while loop of the program:

Frama_C_show_each_state( a1, a4, a0, a15, a29, a10, a16, a22, a2, a17, a25, a7, a14, a19, a20, a8, a23, a21, a24, a13, a9, a28, a26, a6, a27, a12, a3, a11, a5, a18 );

Once filtered, sorted and de-duplicated, the list of states is exactly 31 long. Once de-duplicated, the messages printed by the call correspond exactly to unique states of the program. This method provides an exact number and it uses documented value analysis features only. The list is provided at the end of the post for the doubters.

Looking for an execution with more than 31 transitions

Knowing that we are looking for an execution where counter goes higher than 31, we can now launch the analysis of the version instrumented with counter:

$ frama-c -val -slevel 100000000 -no-val-show-progress Problem3_terminate_at_Y_with_counter.c
...
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
[value] Called Frama_C_show_each_counter({1})
Problem3_terminate_at_Y_with_counter.c:5:[value] Assertion got status invalid (stopping propagation).
[value] Called Frama_C_show_each_counter({2})
...
[value] Called Frama_C_show_each_counter({31})
...
[value] Called Frama_C_show_each_counter({31})
[value] Called Frama_C_show_each_counter({32})
...
[value] Called Frama_C_show_each_counter({32})
^C

This means that there is at least one infinite execution in the instrumented program, which means that the original program can execute infinitely without emitting Y.

A method not involving a transition counter

An irritating feature of the previously described method is that it does not immediately detect infinite executions that are right under its nose. There could exist, for a certain input, a transition from the initial state to itself, and the analyzer would still have to explore all other transitions, breadth-first, down to a depth of 31, before displaying the Called Frama_C_show_each_counter({32}) message that confirms the existence of an infinite execution.

A method to look for this cycle of length one would be to instrument the program differently. This is what I describe now.

Add variables to the program for the description of the predecessor state

This means transforming the declarations

	int a1 = 1;
	int a4 = 1;
...

into:

	int a1 = 1, p1 = -1;
	int a4 = 1, p4 = -1;
...

This can be done with a regexp.

Check whether a newly reached state coincides with its predecessor state

At the end of a cycle, we compare the new state with the predecessor state. Again, code doing this can be generated with a regexp:

        if 
        ((a1 == p1) &&
        (a4 == p4) &&
...
        (a18 == p18))
          Frama_C_show_each_cycle_found(1);

The above causes a message to be emitted by the analyzer as soon as a cycle is found.

Update the predecessor state

After checking whether the new state coincides with the predecessor state, it is time to update the predecessor state. Yada regexp yada:

        p1 = a1;
        p4 = a4 ;
...
        p18 = a18 ;

And presto: we detect all cycles. All cycles of length one, that is.

Maybe update the predecessor state

Although it would work great on the example that motivated this new method, detecting only cycles of length one is a little bit too specific. Let us fix that.

At the end of a cycle, instead of updating the predecessor state as above, let us maybe update it as below:

    if (any_int())
    {
            p1 = a1;
            p4 = a4 ;
...
            p18 = a18 ;
    }

With the above non-deterministic update of the predecessor state, for each vector a1, a4, ...a18 of variables propagated by the value analysis, the variables p1, p4, ...p18 contain a nondeterministic superposition of all its predecessors. Think of it as programming a quantum computer, and if you think of it this way, remember that the variables p1, p4, ...p18 are entangled: when observed, they contain matching values from one specific predecessor. They are entangled because the conditional above either update variables p1, p4, ... all at once or leaves them all unchanged.


On the other hand, different predecessors do not get mixed together because... well, because the analyzer remains complete on this program even after it being instrumented thus, for reasons I still do not want to document. Nevertheless, the important part of the argument is that a sound and complete analyzer would find the Frama_C_show_each_cycle_found(1); statement above reachable if and only if in the program instrumented to stop at Y, a state is identical to one of its predecessors. Hopefully it is possible to convince oneself of that much without being a value analysis expert—although I admit that programming nondeterministic machines is an unusual exercise.


Using Frama-C's value analysis as a sound and complete abstract interpreter, the statement Frama_C_show_each_cycle_found(1); will be reached, and a message be displayed, exactly when a state is reached that is identical to one of its predecessors. This happens if and only if there is a cycle in the transition graph of the program instrumented to stop at Y. In terms of the original program, this corresponds to (F oY) being false.

Putting the new method to the test

Running the value analysis on the program instrumented as described:

$ frama-c -val -slevel 100000000 -no-val-show-progress Problem3_terminate_at_Y_with_predecessor.c 
...
[value] Semantic level unrolling superposing up to 100 states
[value] Semantic level unrolling superposing up to 200 states
[value] Called Frama_C_show_each_cycle_found({1})
^C

Incidentally, on this example, this new method is much faster than the method involving a counter. It is not clear that it should always be the case, and on difficult examples, the two methods can perhaps be tried in parallel.

Annex: an exhaustive list of states for Problem 3

Below is the list of unique states displayed when the statement

Frama_C_show_each_state( a1, a4, a0, a15, a29, a10, a16, a22, a2, a17, a25, a7, a14, a19, a20, a8, a23, a21, a24, a13, a9, a28, a26, a6, a27, a12, a3, a11, a5, a18 );

is inserted in the while loop of Program3.c.

[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{0},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{0},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{0},{1},{0},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{1},{0},{0},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{1},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{0},{1},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{0},{1},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{0},{1},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{1},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{1},{0},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{0},{1},{1},{1},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{0},{0},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{0},{1},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{1},{0},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{0},{1},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{0},{0},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{1},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{0},{1},{1},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{0},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{0},{1},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{0},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{0},{2},{0},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{0},{2},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{1},{0},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{1},{1},{1},{1},{1})
[value] Called Frama_C_show_each_state({1},{1},{1},{1},{1},{1},{1},{1},{0},{0}, {0},{0},{0},{0},{0},{0},{0},{1},{1},{1}, {1},{1},{1},{1},{1},{1},{2},{0},{1},{1})


This post benefited from discussions with Sébastien Bardin, Sven Mattsen, Benjamin Monate and Boris Yakobowski.

Tuesday, August 21 2012

Participating in the RERS 2012 competition: liveness questions

This third installment discusses the diagnosis of liveness properties of particular C programs, using Frama-C's value analysis, for the RERS 2012 competition.


[Note added in September 2012: the diagnostic found for the example in this post disagrees with the diagnostic found later and sent to the competition organizers. I am unable to reproduce the result obtained here. Perhaps it was obtain with a version of Frama-C that had a bug.]

Liveness properties 101

Here is another extract of two properties found in the property list attached to Problem 1 in the competition:

(G ! oW)
output W does never occur
----------------------------------------

(F oW)
output W occurs eventually

To tell you how unaccustomed to temporal properties I am, I first read the second formula as a negation of the first one. There are redundancies in the lists of formulas to evaluate, and I thought this was one more of them. It is of course not the case: (F oW) is not the negation of (G ! oW).

(G ! oW) says that there is no execution path that leads to W being output; the negation of that is that there is at least one such execution path. By contrast, (F oW) expresses that all execution paths have a W being output at some point, a much stronger property.


Indeed, (F oW), expressing that a state in which W is output is eventually reached, is a typical liveness property. Checking this kind of property is at odds with traditional abstract interpretation practices, where the analysis is considered done when a fixpoint has been reached: the fixpoint does not tell whether all executions eventually leave the loop. It only tells what is guaranteed for those executions that leave the loop.

Semi-solution: a simple program transformation

Please bear with me and consider the simple modification of Problem1.c below, applied on top of the previous instrumentation:

--- Problem1.c	2012-08-08 10:15:00.000000000 +0200
+++ Problem1_terminate_at_W.c	2012-08-21 17:00:38.000000000 +0200
@@ -1,9 +1,8 @@
 void assert(int x)
 {
   if (!x)
     {
-      Frama_C_show_each_assert_reached();
       /*@ assert \false; */
     }
 }
 
@@ -594,6 +593,8 @@
 
         // operate eca engine
         output = calculate_output(input);
 
+	// continue only if output not W yet
+	/*@ assert output != w ; */
     }
 }

We remove the user message in our assert() function because we want to temporarily forget about assertions. Besides, by instrumenting the main loop, we make the program stop when the output is W.

You should be able to convince yourself that the modified program terminates if and only if the original program does not admit any infinite execution that never outputs W (for any infinite sequence of values assigned to variable input).


Earlier posts (1, 2) in this blog pointed out how Frama-C's value analysis could be made to forget that it was an abstract interpreter, and verify the termination of programs.

Application

Since everything is ready for this particular property of Problem1.c, let us run the verification. The command to prove that the program terminates is as follows:

$ frama-c -val -obviously-terminates -no-val-show-progress Problem1_terminate_at_W.c
...
[value] Semantic level unrolling superposing up to 400 states
[value] done for function main
[value] ====== VALUES COMPUTED ======

We should have used a timeout, because the analysis fails to terminate when the analyzer is unable to prove that the target program terminate. Nevertheless, we were lucky, the target program terminates, the analyzer is able to verify that the program terminates, and we can conclude that there are no infinite executions of the original Program1.c that fail to emit W.

Assertions and consequences on liveness properties

The previous section only tells us that there is no infinite execution that does not emit W. Depending on the organizers' intentions, this may or may not answer the question of whether (F oW) holds.

It still depends whether you count an execution that reaches one of the assert(0); in the program as, well, an execution. If it count as an execution to which the formula should apply, then an execution that does not emit W and terminates on such an assert(0); certainly does not eventually output W.

There is an argument to say that it shouldn't count, however: although it is impossible to understand what these programs are supposed to do, assertions usually characterize illegal behaviors. Perhaps the LTL formulas are to be evaluated only on the legal, infinite behaviors of the system?

In the latter case, we are done: (F oW) holds because W is eventually emitted in all legal executions. In the former case, the additional question is whether an original assert(0); is reached before W is output, which is a simple reachability question on the instrumented Problem1_terminate_at_W.c. The answer is then that (F oW) does not hold, because there are plenty of assert(0); calls that are reached without W having been emitted.


We e-mailed the organizers and they said it was the first, simpler case. Only infinite executions need be taken into account for evaluating LTL formulas, and interrupted executions can be ignored. Our answer for the formula (F oW) of Problem 1 in the competition is therefore that the property holds.

Proving that something does not eventually happen

In the first part of the post, we set out to prove that the liveness property (F oW) was true. Verifying that the property is false requires a different approach. With the toolset provided by Frama-C, I only see the scheme below. We reduce the problem to the seemingly harder but actually easier problem of computing the maximum number of iterations before W is emitted.

  1. Obtain the number N of reachable states for the system by running the value analysis on the original program.
  2. Instrument the program, as previously making execution terminate when W is emitted, and also maintaining a transition counter (incremented at each iteration of the loop).
  3. Using sound and complete propagation on the instrumented program, either reach a fixpoint (with a finite bound computed for the counter) or find an execution with N+1 transitions that does not emit W. The former case means that the program eventually emits W. The latter that, in the transition graph, there is a reachable cycle along which W is not emitted, and therefore an infinite execution that does not output W.


This technique likely gets pretty heavy in practice. Adding a counter that may go up to N+1 to a system that already has N states can probably cause the new system to have of the order of N² states. This segues into at least one, perhaps two more blog posts, to answer the questions “what is the value analysis actually doing when it is applied on these questions?”, “how much work would it be to do the same thing closer to the metal, executing the C function as compiled instead of laboriously interpreting it?”, and “why don't we make a C model checker while we are at it?”

Monday, August 20 2012

Participating in the RERS 2012 competition: reachability questions

This post is a follow-up to this one. It begins to describe our participation, using Frama-C's value analysis, in the RERS 2012 competition.

In theory

The competition features nine target programs of increasing difficulty. Some of the questions to be answered about the target programs are reachability questions. This is the case for all calls to assert(0); in the program: for each call, the question is exactly whether it can be reached in an execution.

In addition, some LTL formulas are offered about each target program. The goal here is to individually diagnose each LTL formula as true or false when applied to the program. Some of the LTL formulas actually express reachability properties. Consider the following property, from the list that applies to problem 1 (small-easy):

(G ! oU)
output U does never occur

This formula holds if a program state in which U is output is never reached. It suffices to instrument the program a little bit to recognize when this happens, and the problem is reduced to a reachability problem, only in reverse: the property holds if the states are unreachable, and does not hold if one of the states can be reached.

In practice

The good news is that the value analysis is, at its core, designed to answer reachability questions. More specifically, it is designed to answer unreachability questions: it computes supersets of the values of variables that can reach each point in the target program, and it warns if dangerous instructions can be reached. It can guarantee that an assert(0); call is not reached.


Preparing the program for analysis as described in the manual and many times on this blog may result for instance in the following changes: removing the headers, defining a function assert(), removing the printf() calls so that one does not need to provide the function.

--- Problem1.c~	2012-07-31 01:32:42.000000000 +0200
+++ Problem1.c	2012-08-08 10:15:00.000000000 +0200
@@ -1,5 +1,12 @@
-#include <stdio.h> 
-#include <assert.h>
+void assert(int x)
+{
+  if (!x)
+    {
+      Frama_C_show_each_assert_reached();
+      /*@ assert \false; */
+    }
+}
+
 
 	// inputs
 	int a= 1;
@@ -582,14 +589,11 @@
     {
         // read input
         int input;
-        scanf("%d", &input);        
+	input = any_int();
+/*@ assert input == 1 || input == 2 || input == 3 || input == 4 || input == 5 || input == 6 ; */
 
         // operate eca engine
         output = calculate_output(input);
 
-        if(output == -2)
-        	fprintf(stderr, "Invalid input: %d\n", input);
-        else if(output != -1)
-			printf("%d\n", output);
     }
 }

We assume that an execution that reaches an assert(0); is interrupted, and that input at each cycle is a number between 1 and 6. This corresponds to our understanding of the description of the programs in the competition.


As previously hinted, the value analysis is designed to answer reachability questions negatively only. It is intended to guarantee that dangerous operations are unreachable (i.e. the program is safe). In doing so, it computes whether any statement is reachable, even non-dangerous ones, and with what variable values each of these statement is reached.

In static analysis, this is called “soundness”. The value analysis does not in general guarantee that any statement is reachable. However, for the particular program above, instrumented in this particular way, with the right command-line options, the value analysis is “complete” in addition to being sound: in addition to only diagnosing as unreachable statements that are actually unreachable, it only diagnoses as reachable statements that are actually reachable.

Being complete in addition to being sound means that the value analysis is able to answer the question (G ! oU) negatively as well as positively. If the value analysis was only sound, it could only guarantee that a particular assert(0); call is not reached, or that U is never output. Being complete means that it can in addition guarantee that an assert(0); call is reached in some execution or that U is output in some execution.

More formulas to evaluate

There are more complex LTL formulas in the list that express reachability properties, sometimes modulo trivial instrumentation. Below are two of them:

(! oZ WU (oZ WU (! oZ WU (oZ WU (G ! oZ)))))
output Z occurs at most twice
----------------------------------------

(! oU WU oZ)
output U does never occur before output Z

The second formula expresses that a state where U is emitted without Z ever having been emitted is not reached. To verify this property, one simply needs to add to the target program a boolean variable that remembers whether Z has been emitted before.

The first is a complex LTL formula with a simple English interpretation. The program can be instrumented with a counter for the number of times Z is output, and the property to check is whether a state where Z has been emitted three times is reached.

Do not make completeness claims about Frama-C's value analysis at home

You may have noticed that the value analysis' manual rarely offers precision guarantees, and in particular never describes the conditions in which it is complete.

Therefore, an independent participant in the competition who wasn't a value analysis developer would not be able to assert that the value analysis is behaving completely. Such a participant could only hope to answer reachability questions negatively, finding that (G ! oU) is true (when it is actually true) or finding that a particular assertion is never reached (when it is actually unreachable). This is the direction that matters when trying to establish that error states are unreachable, to verify the safety of software. The competition organizers implicitly acknowledge that this kind of property can be difficult to establish otherwise when they say, about weighting answers:

The weighting scheme gives you a way to express your personal confidence in your results. e.g. if you have found a feasable path to some error, this is safe, and you should weight it 10. Lifeness properties or stating that certain errors are non-feasible is of course more risky.

Here we are using a sound analyzer to establish that certain errors are non-feasible, so in our case there is little risk involved in attributing the maximum weight to these answers. There is in fact more risk when answering that some execution path leads to some error since the value analysis, even when it happens to be complete, is not designed to produce a trace that we could double-check. But we are confident for these errors too because of all the Csmith testing that we applied earlier in the year.


And by the way, this blog post does not count as documentation: it is still undocumented in what circumstances Frama-C's value analysis is complete. The documentation is long enough trying to describe how to use the value analysis soundly; and this is despite soundness being its defining property. There is no point in enumerating the very specific circumstances in which it is complete. The conditions are unlikely to occur simultaneously in real life, and if you think that this competition contradicts this statement, have another look at the shape of the competition's target programs.

Monday, August 6 2012

Understand LTL? Join us!

Here is yet another software verification competition. If you are a specialist of the verification of temporal properties, and you have been regretting not to snatch that easy Xbox 360 (with Kinect!) in 2011*, this is your chance to make up for it!


We have the reachability part of the problems nearly done** and we are looking for someone who can reduce (at least some of) the temporal properties in the challenge to something that can be verified with the value analysis. We expect that part to be expensive, so if you know how to do it, we should get together and start on this as soon as possible.


(*) I am not saying that a deep understanding of temporal properties would have helped you much in 2011, but it cannot hurt to be able to think clearly about these things

(**) No, really, 95%. We're almost there.

Friday, July 27 2012

Oxygen is stricter about types and why you should get used to it

I have just sent a list of changewishes (1, 2) to a static analysis competition mailing-list, and that reminded me of a blog post I had to write on the strictness of the type-checker in upcoming Frama-C release Oxygen. This is the blog post.

This post is not about uninitialized variables

The static analysis competition in question evaluates each participating tool on many testcases. A good number of these testcases are rejected by the current Frama-C. This is why I was writing to the mailing-list: Frama-C cannot participate in the competition if the testcases are not changed (Frama-C does not have to participate, of course. It's just that it looks fun and we would probably like to be in it).

In fact, many of the testcases were already problematic for Frama-C version 20111001 (Nitrogen), the version in the current Debian, Ubuntu, FreeBSD, NetBSD and other Unix distributions. Indeed, a lot of the testcases rely on uninitialized variables for entropy, which this post by Xi Wang and this post by Mark Shroyer show is wrong. Instead of the problem that is supposed to be detected (or not), Nitrogen detects the uninitialized use. I covered this already; this blog post is not about uninitialized variables (keep reading!).

This post is about C type-checking

While trying to get a list of uninitialized-variable-using testcases, I realized that something had changed since my last evaluation of the competition's benchmarks. Many of them were now rejected at type-checking!

The new problem is, many testcases in the benchmarks call functions without having provided a prototype, and some ultimately define the called function with a type incompatible with the type inferred at the call site. Frama-C used to be lenient about typing issues, but after fixing one soundness bug too many that was caused by the leniency, we have decided to throw in the towel. Virgile described one of the typing issues for which Frama-C used to be too lenient. It was not the only one.

This is an unusual position to take. In the article A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World, the makers of a commercially successful bug-finding tool state:

Our product's requirements roughly form a "least common denominator" set needed by any tool that uses non-trivial analysis to check large amounts of code across many organizations; the tool must find and parse the code, [...]

The above is wise: to be maximally useful, the static analyzer should choke on as few idiosyncrasies as possible. The analyzed code can be assumed to compile with some compiler, and to go through some tests (there must still be a few of these, right)? Why warn when a function is called without a prototype, if the compiler accepted it? Why reject when the function's implementation has a type that is incompatible with the type that was inferred at the call site? This is probably not the issue the user is looking for (if it was, the user would have fixed it when eir compiler warned for it).

Oxygen is strict and that's it

Well, our answer is that we tried and we found that it was too much work to try to be sound and precise with these constraints, as exemplified by Virgile's blog post. Show us a tool that accepts your weakly typed program, and we will show you a tool that probably isn't sound and precise at the same time (we have kept the examples that led to the decision. These examples demonstrate real issues masked by lenient typing. If you think you have found a sound analyzer that is at the same time conveniently permissive on typing, it will be our pleasure to provide the examples for you to try).


We hope that Frama-C will still be useful with these new restrictions on typing. Fortunately for us, there are more real worlds than the expression “the Real World” in the cited article's title might lead you to think (and this quip should not be taken as a reproach towards the authors of “A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World”, a very useful document with an appropriately chosen title considering its intended public).

Thursday, June 28 2012

Undefined behavior is a harsh mistress

Mark Shroyer has discovered an interesting consequence of undefined behavior in the compiled version of a C program: a variable behaves as both true and false, as a result of being uninitialized.

The post is great, and could not come at a better time. I needed, for the talk I will give on Saturday morning at COMPARE2012, convincing examples that some undefined behaviors cannot be recovered from. For a long time, I had wanted an example that “uninitialized” was not equivalent to “holding an unknown value”, and this is just it. Once again, procrastination wins!

One a side note, should I be worried that I am building tolerance to the panic of not having one's slides ready for an upcoming talk? If I can't even get in a frenzy in the last few days before the talk, I may just become useless. Perhaps it's time for retirement.


The only disappointing aspect of Mark's blog post is the general reaction to it, in the blog's comments or elsewhere. Guys, Mark obviously knows what undefined behavior is! He is not blaming GCC for not giving a consistent value to his variable p. Using uninitialized memory is a bug in his program; it's a bug he has been looking for, and the consequences of it were funny, so he shared. No need to get all professorial about it.

The so-stupid-it's-funny prize goes to the comment below.

First, you demonstrate ignorance talking about C language with the “bool” data type.

Dude, Mark demonstrates his mastery of C by properly using stdbool.h as part of a sound discussion of undefined behavior in a C program.


If you are in the Manchester area and are reading this, please drop by on Saturday. Gate-crash the conference if you have to. You can always blame my bad influence if you get caught.

- page 1 of 2