Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Wednesday, October 12 2016

A mini ACSL tutorial for Value, part 3: indirect assigns

To conclude our 3-part series on ACSL specifications for Value, we present a feature introduced in Frama-C Aluminium that allows more precise specifications: the indirect label in ACSL assigns clauses. The expressivity gains when writing \froms are especially useful for plugins such as Value.

Indirect assigns

Starting in Frama-C Aluminium (20150601), assigns clauses (e.g. assigns x \from src) accept the keyword indirect (e.g. assigns x \from indirect:src), stating that the dependency from src to x is indirect, that is, it does not include data dependencies between src and x. In other words, src itself will never be directly assigned to x.

Indirect dependencies are, most commonly, control dependencies, in which src affects x by controlling whether some instruction will modify the value of x. Another kind of indirect dependency are address dependencies, related to the computation of addresses for pointer variables.

Let us once again refer to our running example, the specification and mock implementation of safe_get_random_char. As a reminder, here's its specification, without the ensures \subset(*out,...) postcondition, as suggested in the previous post:

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

Here's the mock implementation of the original function, in which ensures \subset(*out,...) holds:

#include "__fc_builtin.h"
#include <stdlib.h>
typedef enum { OK, NULL_PTR, INVALID_LEN } status;

status safe_get_random_char(char *out, char const *buf, unsigned n) {
  if (out == NULL || buf == NULL) return NULL_PTR;
  if (n == 0) return INVALID_LEN;
  *out = buf[Frama_C_interval(0,n-1)];
  return OK;
}

And here's the main function used in our tests:

void main() {
  char *msg = "abc";
  int len_arr = 4;
  status res;
  char c;
  res = safe_get_random_char(&c, msg, len_arr);
  //@ assert res == OK;
  res = safe_get_random_char(&c, NULL, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(NULL, msg, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(&c, msg, 0);
  //@ assert res == INVALID_LEN;
}

In the mock implementation, we see that out and buf are tested to see if they are equal to NULL, but their value itself (i.e., the address they point to) is never actually assigned to *out; only the characters inside buf may be assigned to it. Therefore, out and buf are both control dependencies (in that, they control whether *out = buf[Frama_C_interval(0,n-1)] is executed), and thus indirect as per our definition.

n is also a control dependency of the assignment to *out, due to the check if (n == 0). n also appears in buf[Frama_C_interval(0,n-1)], leading this time to an address dependency: in lval = *(buf + Frama_C_interval(0,n-1)), lval depends indirectly on every variable used to compute the address that will be dereferenced (buf, n, and every variable used by Frama_C_interval in this case).

If we run Value using our specification, this is the result:

[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  msg ∈ {{ "abc" }}
  len_arr ∈ {4}
  res ∈ {2}
  c ∈
   {{ garbled mix of &{c; "abc"}
    (origin: Arithmetic {file.c:33}) }}

Note that c has a garbled mix which includes c itself, plus the string literal "abc". The culprit is this assigns clause:

assigns *out \from out, buf, buf[0 .. n-1], n;

out is c and buf is "abc". n, despite also being a dependency, does not contribute to the garbled mix because it is a scalar. The garbled mix appears because, in some functions, it is the address of the pointer itself that is assigned to the lvalue in the assigns clause. Without a means of distinguishing between direct and indirect dependencies, one (dangerous) workaround is to omit some dependencies from the clauses. This may lead to incorrect results.

Thanks to indirect \from clauses, now we can avoid the garbled mix by specifying that out and buf are only indirect dependencies. Applying the same principle to all assigns clauses, we obtain the final version of our (fixed) specification:

/*@
  assigns \result \from indirect:out, indirect:buf, indirect:n;
  assigns *out \from indirect:out, indirect:buf, buf[0 .. n-1], indirect:n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from indirect:out, indirect:buf, indirect:n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from indirect:out, indirect:buf, indirect:n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

Note that indirect dependencies are implied by direct ones, so they never need to be added twice.

With this specification, Value will return c ∈ [--..--], without garbled mix. The result is still imprecise due to the lack of ensures, but better than before. Especially when trying Value on a new code base (where most functions have no stubs, or only simple ones), the difference between garbled mix and [--..--] (often called top int, that is, the top value of the lattice of integer ranges) is significant.

In the new specification, it is arguably easier to read the dependencies and to reason about them: users can skip the indirect dependencies when reasoning about the propagation of imprecise pointer values.

The new specification is more verbose because indirect is not the default. And this is so in order to avoid changing the semantics of existing specifications, which might become unsound.

The From plugin has a new (experimental) option, --show-indirect-deps, which displays the computed dependencies using the new syntax. It is considered experimental simply because it has not yet been extensively used in industrial applications, but it should work fine. Do not hesitate to tell us if you have issues with it.

Ambiguously direct dependencies

It is not always entirely obvious whether a given dependency can be safely considered as indirect, or if it should be defined as direct. This is often the case when a function has an output argument that is related to the length (or size, cardinality, etc.) of one of its inputs. strnlen(s, n) is an example of a libc function with that property: it returns n itself when s is longer than n characters.

Let us consider the following function, which searches for a character in an array and returns its offset, or the given length if not found:

// returns the index of c in buf[0 .. n-1], or n if not found
/*@ assigns \result \from indirect:c, indirect:buf,
                          indirect:buf[0 .. n-1], indirect:n; */
int indexOf(char c, char const *buf, unsigned n);

Our specification seems fine: the result value is usually the number of loop iterations, and therefore it depends indirectly on the related arguments.

However, the following implementation contradicts it:

int indexOf(char c, char const *buf, unsigned n) {
  unsigned i;
  for (i = 0; i < n; i++) {
    if (buf[i] == c) return i;
  }
  return n;
}

void main() {
  int i1 = indexOf('a', "abc", 3);
  int i2 = indexOf('z', "abc", 3);
}

If we run frama-c -calldeps -show-indirect-deps (that is, run the From plugin with callwise dependencies, showing indirect dependencies) in this example, we will obtain this output:

[from] ====== DISPLAYING CALLWISE DEPENDENCIES ======
[from] call to indexOf at ambiguous.c:10 (by main):
  \result FROM indirect: c; buf; n; "abc"[bits 0 to 7]
[from] call to indexOf at ambiguous.c:11 (by main):
  \result FROM indirect: c; buf; n; "abc"[bits 0 to 23]; direct: n
[from] entry point:
  NO EFFECTS
[from] ====== END OF CALLWISE DEPENDENCIES ======

Note that, in the second call, n is computed as a direct dependency. Indeed, it is directly assigned to the return value in the code. This means that our specification is possibly unsound, since it states that n is at most an indirect dependency of \result.

However, if we modify our implementation of indexOf to return i instead of n, then From will compute n as an indirect dependency, and thus our specification could be considered correct. The conclusion is that, in some situations, both versions can be considered correct, and this not will affect the end result.

One specific case where such discussions may be relevant is the case of the memcmp function of the C standard library (specified in string.h): one common implementation consists in comparing each byte of both arrays, say s1[i] and s2[i], and returning s1[i] - s2[i] if they are different. One could argue that such an implementation would imply that assigns \result \from s1[0 ..], s2[0 ..], with direct dependencies. However, this can create undesirable garbled mix, so a better approach would be to consider them as indirect dependencies. In such situations, the best specification is not a clear-cut decision.

Frama-C libc being updated

The Frama-C stdlib has lots of specifications that still need to be updated to take indirect into account. This is being done over time, which means that unfortunately they do not yet constitute a good example of best practices. This is improving with each release, and soon they should offer good examples for several C standard library functions. Until then, you may refer to this tutorial or ask the Frama-C community for recommendations to your specifications.

Also note that some of these recommendations may not be the most relevant ones when considering other plugins, such as WP. Still, most tips here are sufficiently general that they should help you improve your ACSL for all purposes.

Friday, September 30 2016

A mini ACSL tutorial for Value, part 2: functional dependencies

In our previous post, we left you in a cliffhanger: which \from is missing from our ACSL specification for safe_get_random_char? In this post, we explain the functional dependencies in our specification, how to test them, and then present the missing dependency.

Where do the \from come from?

Our complete specification for safe_get_random_char in the previous post includes several functional dependencies. Some of them are obvious, others not so much. For easier reference, here is the specification once again:

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
    ensures \subset(*out, buf[0 .. n-1]);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

As a reminder, assigns a \from b1, b2, ... specifies that there are control and/or data dependencies from b1, b2, ... to a. Data dependencies are direct or indirect assignments (e.g. a = b1 or c = b1; a = c), and control dependencies are related to control-flow (e.g. if (b1 && b2) { a = 1; } else { a = 2; }). Value uses them for several purposes, such as:

  • precision: to define the possible origins of pointers, otherwise imprecise values will degenerate into "modifies anything";
  • efficiency: to avoid recomputing the state during a leaf function call, if the only difference is in the values of variables that no one depends on.

Value always requires that assigns clauses contain \from dependencies. Since \from are an overapproximation of the actual dependencies, in case of doubt the safe bet is to include more than the necessary. Too many \from, however, may lead to imprecise analyses.

One way to consider whether a given location should be part of a \from is to think in terms of variation: if the actual value of the location changed, would it possibly affect the assigned lvalue? If so, then there is a dependency.

Let us revisit our first get_random_char function, in particular its assigns clause:

//@ assigns \result \from buf[0 .. n-1], n;
char get_random_char(char const *buf, unsigned n);

If we change the value of any character between buf[0] and buf[n-1], this can have a direct impact on the result. Changing the value of n obviously also has an effect: there are more characters to choose from.

However, one thing that does not affect the result is the address of buf itself, that is, the contents of the buf pointer: if the buffer starts on address 0x42, or on address 0xfffe1415, the result of the function is the same, as long as the precondition is valid (i.e., the memory location is readable) and the contents of the buffer are the same.

But why, then, do we have buf (the pointer, not the pointed value) as part of the \from in safe_get_random_char (copied below)?

//@ assigns \result \from out, buf, n;
status safe_get_random_char(char *out, char const *buf, unsigned n);

Note that out is also present as a dependency, and for the same reason: the NULL pointer. In our specification, we included assumes clauses such as out == \null || buf == \null. We may expect, therefore, that our function will be able to distinguish whether any of these pointers is null, very likely via a test such as the following:

...
if (out == NULL || buf == NULL) { return INVALID_LEN; }
...

Such a test introduces control dependencies between out, buf and \result: if out is 0, the result is different than if out is nonzero (i.e. non-null). For that reason, out and buf must be included as dependencies. Conversely, note that the contents of the buffer itself do not affect \result.

When in doubt, try some code

A good way to check a specification is (when possible) to write an abstracted code of the function under test, then run Value (and sometimes From) and see if it matches the expectations.

For safe_get_random_char, a few lines of code suffice to obtain a roughly equivalent version of our specification:

#include "__fc_builtin.h"
#include <stdlib.h>
typedef enum { OK, NULL_PTR, INVALID_LEN } status;

status safe_get_random_char(char *out, char const *buf, unsigned n) {
  if (out == NULL || buf == NULL) return NULL_PTR;
  if (n == 0) return INVALID_LEN;
  *out = buf[Frama_C_interval(0,n-1)];
  return OK;
}

Note the usage of the built-in Frama_C_interval (included via __fc_builtin.h) to simulate a random value between 0 and n-1 (inclusive).

Now we need a main function to invoke our code. We will reuse the one we defined before, since its calls activate all branches in our code. Re-running Value, this time using the code plus the specification, will prove all pre- and post-conditions, except the one related to \subset(*out, buf[0 .. n-1]). This is just an imprecision in Value that may disappear in the future.

We can also try running the From plugin (frama-c -deps), without our specification, to see if the dependencies it infers are compatible with the ones we specified in our assigns clauses.

For function safe_get_random_char, From will infer the following dependencies related to \result and *out:

  ...
  a FROM Frama_C_entropy_source; out; buf; n; "abc" (and SELF)
  \result FROM out; buf; n
  ...

Frama_C_entropy_source comes from the usage of Frama_C_interval. For now, let us focus on the other variables:

  • a is actually *out, and "abc" is actually buf[0 .. n-1]. The result of the From plugin is very concrete, so we have to abstract a few variable names.
  • the SELF dependence related to *out simply means that the variable may not be assigned in the function (e.g. when an error is found). This information is not present in assigns clauses. In particular, for Value it could be useful to have a "must assign" clause, describing an underapproximation of assignments that are certain to happen in every execution. This is not present in ACSL, but it can often be compensated with the use of behaviors with refined assigns and ensures clauses, as we did.
  • all dependencies, both for *out and \result, are the ones we had predicted before, except for Frama_C_entropy_source.

Variables such as Frama_C_entropy_source are often used to represent internal states of elements that are abstracted away in specifications. In this case, it can be seen as the internal state of the (pseudo-)random number generator that allows the result of our safe_get_random_char to be non-deterministic. Without it, we would have a function that could be assumed to return the same value every time that the same input parameters were given to it.

Thus, to really model a randomized result, we need to add such a variable to our specification. It is important not to forget to also assign to the variable itself, to represent the fact that its own internal state changed during the call. In our specification, it would be sufficient to add the following line to the global assigns clause:

  assigns Frama_C_entropy_source \from Frama_C_entropy_source;

With this final assigns clause, our specification is indeed complete. This concludes our tutorial on ACSL specifications oriented towards an analysis with Value!

A minor variant, an unexpected result

Let us consider a minor variant of our specification: what if we remove the postcondition ensures \subset(*out, buf[0 .. n-1])? We would expect Value to return [--..--], since *out is said to be assigned by the function, but we did not specify which values it may contain. However, this is what we obtain instead, after the call to safe_get_random_char(&c, msg, len_msg):

  c ∈
   {{ garbled mix of &{c; "abc"} (origin: Arithmetic {file.c:44}) }}

This undesirable garbled mix indicates that some pointer information leaked to the contents of the variable c, more precisely, that the address of c (out) itself could be contained in c. We know this is not the case, since in every reasonable implementation of safe_get_random_char, the address of out is never directly assigned to *out. However, until Frama-C Magnesium, there was no way to specify the distinction between dependencies that may impact the actual contents of an lvalue from dependencies which are only indirectly related to its value (e.g. control dependencies, such as testing whether out is NULL or not).

Because of that limitation, there were typically two ways to deal with that: omit dependencies (dangerous!) or accept the introduction of garbled mix. Frama-C Aluminium brought a new way to handle them that solves both issues. This will be the subject of the next post in this blog. Stay tuned!