Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Wednesday, November 7 2012

Debugging with WP

Initial setting

So, I was checking a small demo of the WP plug-in the other day, just before making a presentation of Frama-C to some future fellow user. This was the traditional binary_search verification presented in the Berlin training session in 2010, but using the WP plugin instead of Jessie. For some reason, I decided to change the type for all the offsets from int to unsigned int. After all, we're morally dealing with array indexes and we want to search very big arrays (with length > INT_MAX, don't we?), what can possibly go wrong?

The code and its contract then become the following:

/*@ 
    requires length > 0;
    requires \valid(a+(0..length - 1));
    requires sorted: \forall integer i,j; 0<=i<=j<length ==> a[i]<=a[j];
    behavior find:
    assumes \exists integer i; 0<=i<length && a[i] == key;
    ensures 0<= \result < length;
    ensures a[\result] == key;
    behavior no_find:
    assumes \forall integer i; 0<=i<length ==> a[i] != key;
    ensures \result == -1;
 */
unsigned int binary_search(int* a, unsigned int length, int key) {
  unsigned int low, high;
  low = 0; high=length-1;
  while (low <= high) {
    unsigned int med = (high-low)/2U+low;
    if (a[med] == key) return med;
    if (a[med] < key) low = med+1;
    else high = med-1;
  }
  return -1;
}

Loop invariants

As all of you know, deductive verification require to write loop invariants to have a slight chance of proving something. An appropriate set of invariants for our case is the following:

  • low is between 0 and high + 1 (at the loop exit)
  • high is less than length
  • all cells before low contain elements smaller than key
  • all cells after high contain elements greater than key

and for the termination, we have

  • the difference between high and low strictly decreases.

Translated into ACSL, with named loop invariants in order to easily identify them in WP's output and without forgetting the loop assigns (WP is a bit picky about that), this becomes:

  /*@
      loop invariant low: 0<=low<=high+1;
      loop invariant high: high < length;
      loop invariant smaller: \forall integer i; 0<=i<low ==> a[i] < key;
      loop invariant greater: 
           \forall integer i; high < i < length ==> a[i] > key;
      
      loop assigns low, high;

      loop variant high - low;
   */

A wrong specification!

we launch Frama-C with the following command line:

frama-c-gui -wp -wp-split -wp-rte binary_search.c
  • -wp asks WP to generate proof obligations (POs) and to launch the default prover (Alt-ergo) on them
  • -wp-split splits POs that are conjunctions, leading to smaller (and hopefully easier to prove) formulas to the provers
  • -wp-rte asks RTE to generate all safety assertions before WP is launched. In other words, we will get proof POs for potential run-time errors.

The result is encouraging, but not perfect: behavior no_find is not proved, and if we dig a little bit in the WP Proof Obligations panel, we see that the branch where the PO fails is the one where we return -1, exactly what we specified:

debugging_with_WP.png

Of course, returning -1 is sensible for indicating failure when we have ints, not so with unsigned... In that case, we can return length to indicate that no suitable index has been found

Subtraction is not harmless

We thus correct the ensures clause and the return statement according to our new error value and relaunch frama-c-gui. This time, almost everything is fine, except from some tiny assertion from RTE, when assigning med-1 to high:

assert rte: med-(unsigned int)1 ≥ 0;

As a matter of fact, if we think of it a little further, if the key is less than all elements of the array, med will become 0 at a certain point, and we'll try to assign -1 to high, which is not exactly what we want (and is likely to result in a buffer overflow at the next loop step if length<UINT_MAX/2). We thus modify our else clause that way:

else { if med == 0 break; high = med - 1; }

Getting a correct program

This time, all POs are discharged, meaning that the program is correct with respect to its specification. I guess this was yet another example that no change in the code, as small as it looks like, can be considered harmless, and that looking at the unproved POs of WP can be an effective way to catch those issues.

Friday, August 3 2012

assume and assert

The previous post links to a message from Michał Moskal highlighting ACSL constructs that the VCC developers at Microsoft Research had either regretted the absence of or found superfluous while re-designing their own annotation language for VCC. In that e-mail, the third item in the “missing” list was:

ACSL is surprisingly missing /*@ assume ... */, we've found it tremendously useful when debugging specifications.

The feedback VCC developers sent us caused us to review our design choices and nudge ACSL a bit. Not having an assume annotation, however, was a conscious decision that remained. I am going to give an explanation that makes it look as if the exclusion is part of a larger, well though-out plan. I did not think of this explanation at the time, and it only came to me recently while talking about something else.

I am not sure what this says about the plan. At some level the plan was there all along, but for some reason we failed to explain it well, perhaps especially in these circumstances when it would have been useful.

The assume annotation in VCC

Design-by-contract is also sometimes called “assume-guarantee reasoning” (if you know the precise difference between these two, please leave a comment. The only reason I see for separate denominations is that “Design by Contract” seems to be a registered trademark of Eiffel Software in the United States).

This is not the “assume” we are looking for. In ACSL, and now in VCC, the expectations of a function are introduced with the requires keyword. The keyword assume could have been used for that, but this is not the choice that was made, and Michał's e-mail is definitely not saying that ACSL is missing a keyword to introduce the precondition of a function.

Instead, in VCC, assume is the dual of assert. The assert annotation is attached to a specific point in a program and introduces a property that is supposed to hold (that the verifier should endeavor to verify) there:

  stuff;

/*@ assert property ; */

// the verifier tries to prove that "stuff" makes "property" hold.

The assume keyword tells the verifier to take it as granted that a property holds somewhere, and to use that for reasoning about the rest of the program:

/*@ assume property ; */

// "property" is useful to understand what "stuff" does.

  stuff;


Beware that ACSL, since it does not use assume this way, reserves the keyword assumes for something else entirely. When the rest of this post says that ACSL does not have assume, it means that ACSL does not have a VCC-like assume.

Keyword minimalism

It is a worthy goal to minimize the number of keywords forced on you by a BISL—Behavioral Interface Specification Language (the acronym is catching on already). One could argue that both the functionality of assume and assert above can be captured with a single keyword. The verifier, when encountering that keyword, should first try to establish the property as a consequence of the code above. Then, regardless of whether it was able to establish it, it should incorporate it in its hypotheses for the study of the code below it.

This is exactly how Frama-C approaches the issue. Plug-ins that understand ACSL at all both try to prove such properties and then use them as assumptions. The properties are introduced with the assert keyword which, in ACSL, conveys both meanings. This may mean that a bit of redundant work is made, but the resulting annotation language is so beautiful, like a smartphone with only two hardware buttons, that the sacrifice is worth it.


Well, this explanation is not actually quite right. Please read on.

Plug-in collaboration and property status management

There is a deeper explanation than just “each plug-in can do the work both ways and that will save a keyword”. This other explanation is specific to what Frama-C is. Or rather what Frama-C intends to be, for it is a long-term effort.


Frama-C is—or intends to be—a collaborative verification framework. Frama-C makes it convenient to seamlessly subject one same annotated program to different and complementary analysis techniques. In particular, it was always the intention that whichever property was assumed by a plug-in could be proved by another. For the Frama-C platform to be usable this way, it is necessary, but not sufficient, to have a single annotation language with a formal semantics that all plug-ins agree on, to express properties like \valid(p).

It is also necessary to use a single keyword for assumptions and goals, because without that, the user would need to go to the source code and change assume to assert or vice-versa when switching from a plug-in to another. In Frama-C, a plug-in's assumption is another plug-in's goal.


The Frama-C kernel tracks each program property and its verification status. A property that was only ever assumed, never proved, is an assumption of the whole verification. A property that was assumed by a plug-in and proved by another is just like an intermediate lemma in a mathematical proof.

The devil is in the details, of course, and the details are in the article “Combining Analyses for C Program Verification” (FMICS2012) by my colleagues Loïc Correnson and Julien Signoles.


Trying to prove a goal can still be expensive and unnecessary. There will eventually be a way to mark a property as “assumed” in the Frama-C kernel for efficiency reasons. We are still thinking about the best way to do that.


Thanks to Julien Signoles, Florent Kirchner, Virgile Prevosto, Sébastien Bardin and Michał Moskal for their remarks on earlier iterations of this blog post.

Thursday, August 2 2012

restrict is not good for modular reasoning

ACSL

There were quite a few posts recently that were concerned with ACSL, and a few more are planned for the near future. ACSL is a specification language for C (comparable with JML for Java, for those who know about JML). Some people call it a BISL, for “Behavioral Interface Specification Language”, but considering that this acronym's definition needs to be introduced everytime, on account of nobody in the world having ever seen it before, and that it has very little reuse potential, please forget that we mentioned it.

ACSL is Frama-C's annotation language, and it is partially implemented in Frama-C, but it is not intended to be just Frama-C's. According to this message, Microsoft Research's VCC now uses an annotation language with ACSL-like syntax and semantics.


ACSL allows to write function contracts, the stuff that modular reasoning about programs is made of. A function contract is supposed to tell you everything you need to know about the function, so that you do not need to look at its code. The function might still be unimplemented, even.

ACSL also allows to write annotations that go inside the function and are specific to one particular implementation. These annotations are typically hints to allow the verification that the specific implementation satisfies the contract. They can be, for instance, loop invariants or assertions.

It is good on several levels to be able to write and read properties about a C program in a formal language with unambiguous meaning. A compiler could parse and interpret ACSL to obtain information about the programmer's intentions, and take advantage of the opportunity to generate faster code. Hey, compiler makers, I said FASTER CODE.

The restrict type qualifier, again

Recently Stephen Canon reminded me that the good thing about restrict is that it is part of the function prototype, and I showed the differences with ACSL's approach.

Why does ACSL introduce an new notion, the \separated predicate, where C99 already has restrict? One reason that I pointed out is that while restrict offers a very concise syntax for some specialized uses, \separated is useful for other things that are not convenient to express with restrict.

Another reason is that a C prototype with restrict annotations still does not work very well as a contract. The restrict qualifier in the prototype implicitly refers to the function's code. This blog post expands on this.

The meaning of restrict

The restrict qualifier applied to a pointer more or less means that any offset of the pointer can only alias with an offset of the pointer. Please read C99's 6.7.3.1 section for the exact definition. I will simply quote the example from paragraph 8 there:


---

The function parameter declarations in the following example

void f(int n, int * restrict p, int * restrict q)
{ 
  while (n-- > 0) 
  *p++ = *q++; 
}

assert that, during each execution of the function, if an object is accessed through one of the pointer parameters, then it is not also accessed through the other.

---

Note that the notion of any offset of a pointer p being separated from everything else that is not also an offset of p is a relative notion. In the absolute, p has to point somewhere. If for instance p points inside a global array t, then p aliases with t. The restrict qualifier only means that locally, inside the function f, for what f does, an offset of p does not alias with any lvalue not based on p. As the next paragraph in the standard puts it:


---

The benefit of the restrict qualifiers is that they enable a translator to make an effective dependence analysis of function f without examining any of the calls of f in the program. The cost is that the programmer has to examine all of those calls to ensure that none give undefined behavior. For example, the second call of f in g has undefined behavior because each of d[1] through d[49] is accessed through both p and q.

void g(void) 
{ 
  extern int d[100]; 
  f(50, d + 50, d); //valid 
  f(50, d + 1, d); //undefined behavior 
} 

---

The call f(50, d + 50, d) is valid because although offsets of d+50 can obviously alias with offsets of d, they do not alias inside the function for what the function does. When calling f(50, d + 1, d);, offsets of the two restrict-qualified pointers do alias, and the call invokes undefined behavior.

No modular reasoning

And this sums up why restrict alone would not be good for modular reasoning about programs. It is part of the prototype, and the prototype is a kind of limited contract, but the exact meaning of restrict depends on the function's implementation. If the function computes p + 49, then p + 49 is part of the offsets of p that must not alias with the rest of the memory locations accessed by f. If f accesses the global variable G by name, then p must not point to G. if f does not access G by name, then p is allowed to point to G. It is only one particular implementation of f that gives meaning to restrict.

By contrast, a proper contract is supposed to be self-contained. The ACSL precondition \separated(p+(0..n-1), q+(0..n-1)); tells the programmer exactly how the arguments p and q are allowed to alias. An implementation for f needs not exist for the programmer to start using it with the confidence that ey is calling f properly.

Tuesday, July 31 2012

On arrays vs. pointer, the ACSL way

Some time ago, we saw that in C arrays and pointers have some subtle differences. A facetious colleague just remarked that this is also the case in ACSL, especially if you use the \at(e,L) construction, which basically says that e is supposed to be evaluated in the state when the program reached label L for the last time. For instance, let us consider the following function:

int t[10];

void f() {

  int* p = t;
  int i = 4;
 L0:
  t[i++] = 4;
 L1:
  t[i--] = 5;
 L2:
  t[i++] = 6;
  return;
}

Can you decide which of the following assertions hold just before f returns?

  1. assert \at(t[\at(i,L0)],L1) == 4;
  2. assert \at(t,L1)[\at(i,L0)] == 4;
  3. assert \at(p[\at(i,L0)],L1) == 4;
  4. assert \at(p,L1)[\at(i,L0)] == 4;

Answers

  /*@ assert \at(t[\at(i,L0)],L1) == 4; */

This assertion is true: namely, i is equal to 4 in L0, so that we're evaluating t[4] in L1, which is indeed equal to 4.

/*@ assert \at(t,L1)[\at(i,L0)] == 4; */

This assertion is true, but the evaluation is not that simple: according to ACSL's arrays semantics, we convert t into a logic array whose cells have the same value as the one of t in the state L1. Then we access the cell number 4, the value of i at L0, that happens to contain 4.

/*@ assert \at(p[\at(i,L0)],L1) == 4; */

This assertion is true, for the same reason as the first one.

/*@ assert \at(p,L1)[\at(i,L0)] == 4; */

This assertion is false, because the evaluation is slightly different than in the second case: This time, we evaluate the pointer p in L1, the offset i in L0, and we dereference the result in the current state, where *(p+4) is 6.

Wednesday, July 25 2012

The restrict qualifier as an element of specification

An insightful interjection

Stephen Canon, whose explanations about the fine points of floating-point I have appreciated on StackOverflow, chimed in on the subject of the restrict qualifier. With minor edits for consistency with this blog's formatting:

Consider memcpy(); its arguments are declared restrict. This not only means that the source and destination pointers cannot be the same (which is what Pascal's "replacement" says), but also means that the buffers cannot overlap at all (i.e. the src pointer cannot be anywhere in the range [dst-(length-1), dst+(length-1)]).

restrict also has the virtue of making this contract apparent to your callers, as it's part of the API. People should know that they can't call memcpy() with overlapping buffers, and use memmove() instead. If the information is instead encoded in a cryptic line in the implementation, then it isn't obvious to a developer who intends to use the function.

Non-aliasing granularity

I noted that my not-really-serious proposal did not have the same granularity as the original, with both good and bad consequences. An advantage of my proposal is that it is possible to specify only that p does not alias with q and that r does not alias with t. As defined in C99, restrict is more concise, but this also means that there are some properties you might want to specify and that you cannot.

restrict as an element of specification

But I failed to realize that restrict, being a type qualifier (for the casual C/C++ programmer: like const), is part of the function prototype. I was only thinking of the opportunity for better code generation inside f(), but it would theoretically be possible to have compilers that warn about passing aliasing pointers to f().


That's the theory anyway. In practice the situation is even worse than for const. For const you get type-system level warnings about casts discarding the qualifier. The warnings are sometimes useful and sometimes annoying. For restrict I do not know of any tool that makes any check of proper use at the call site.

An alternative

One tool that comes close to checking for proper use of functions with const and restrict-like specifications is developed nearby (have you heard of it?). One drawback is that it is work in progress. And it uses its own syntax for function specifications.


Here is the specification for memcpy() in that system:

/*@ requires \valid(((char*)dest)+(0..n-1))
  @          && \valid(((char*)src)+(0..n-1));
  @ requires \separated(((char *)dest)+(0..n-1),((char *)src)+(0..n-1));
  @ assigns ((char*)dest)[0..n-1] \from ((char*)src)[0..n-1];
  @ ensures memcmp((char*)dest,(char*)src,n) == 0 && \result == dest;
  @*/
extern void *memcpy(void *restrict dest,
                    const void *restrict src, size_t n);


Yes, it is longer than just sprinkling the prototype with const and restrict, and I did not even include the definition of the logic predicate memcmp. But on the other hand, what you have here is a complete specification of memcpy(). You can ask your worst enemy to write a function to this specification, and ey proves that eir implementation satisfies the specification, you can rest assured that, for all valid inputs, the function terminates and computes the right result. The worst your enemy could do would be to provide you with a function that is arbitrarily slower than necessary.

Side note: solutions to the problem of your worst enemy providing you with a function that is too slow are being researched, but even researchers working on that would agree that this field has not reached the maturity of the solutions to the problem of your worst enemy providing you with a function that functionally does the wrong thing.

I can provide a neat trick for verifying that a function executes in constant time, but this is only a very partial solution to the problem.


Enough about this hypothetical worst enemy. The const part of memcpy()'s specification, and more, is contained in the clause:

assigns ((char*)dest)[0..n-1] \from ((char*)src)[0..n-1];

The restrict part is contained in:

requires \separated(((char *)dest)+(0..n-1),((char *)src)+(0..n-1));

I might be biased, but I think it is just as clear, and separate concerns are better separated. The \separated predicate, for instance, is also useful for expressing exactly the conditions under which the statement (*p=*p) & (*q=*q); is defined.


The ACSL by example library contains more examples of function contracts.

Monday, November 15 2010

Loop assigns, part 3: On the importance of loop invariants

This post is the third of a serie dealing with loop assigns. The first can be found here and the second here.

Continue reading...

Sunday, October 17 2010

loop assigns, part 2

This post shows various ways to specify the loop assigns which modifies at each step a pointer and its content. This is the answers to the exercise of the preceding post on this subject.

Continue reading...

Thursday, October 7 2010

Specification of loop assigns

ACSL's assigns and loop assigns clauses are a quick way to specify the intended effects of a given piece of code. However, the semantics of loop assigns might sometimes be a bit difficult to grasp. This post tries to clarify the situation.

Continue reading...