Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - tutorial

Entries feed - Comments feed

Tuesday, April 11 2017

A simple EVA tutorial, part 3

On the previous post we've seen how to run EVA, but at the end we had a NON TERMINATING FUNCTION for a function that is supposed to always terminate, a likely indication that a definitive undefined behavior has been found in the analysis. In this post, we will see how to diagnose such cases, using derived plug-ins and the GUI.

We will reuse the save file produced at the end of the analysis, value2.sav.

When the GUI is not enough

Usually, after running the value analysis in the command-line, we launch the GUI to visualize the results:

frama-c-gui -load value2.sav

In this case, because of the NON TERMINATING FUNCTION message, we know that at some point in the program we will have red statements in the GUI, which indicate unreachable code.

By scrolling down from the main function, we reach the non-terminating statement, which is a call to test_x25519:

Unreachable code in the Frama-C GUI

Note the red semicolon at the end of the statement, and the fact that the following statements are also red. If we click on the statement, the Information panel says that This call never terminates.

You can right-click on the function and Go to the definition of test_x25519, and you will find the same thing inside, this time a call to crypto_x25519_public_key, and so on, until you reach fe_tobytes, which is slightly different: it contains a for loop (defined via a macro FOR), after which all statements are red, but the loop itself is not an infinite loop: it simply iterates i from 0 to 5. How can this be non-terminating?

The answer, albeit non-intuitive, is simple: there is one statement inside the loop which is non-terminating, but not during the first iteration of the loop. Because the GUI colors are related to the consolidated result of all callstacks (i.e., if there is at least one path which reaches a statement, it is marked as reachable), it cannot show precisely which callstack led to the non-termination. To better understand what happens here, we will use the Nonterm plug-in.

Nonterm to the rescue

Nonterm is a small plug-in that uses the result of the value analysis to display callstack-wise information about non-terminating statements, by emitting warnings when such statements are found. It requires EVA to have been executed previously, but it runs very quickly itself, so you do not need to save its results. Close the GUI and re-run it again, this time with Nonterm:

frama-c-gui -load value2.sav -then -nonterm -nonterm-ignore exit

The -nonterm-ignore exit argument serves to minimize the number of warnings related to calls to the libc exit() function, which is always non-terminating.

The warnings generated by Nonterm are displayed in the Messages panel, after those emitted by EVA.

Examples of Nonterm warnings

The warnings display the non-terminating callstacks. The order of the warnings themselves is not relevant. However, some kinds of warnings are more useful than others. Here is a rough indication of their relevance, from most to least precise:

  1. Non-terminating statements;
  2. Non-terminating loops;
  3. Non-terminating function calls;
  4. Unreachable returns.

In our analysis, the first (and only) warning about a non-terminating statement is the following:

Non-terminating statement

Note a few important details about the Frama-C GUI:

  • When you click on the warning in the Messages panel, the GUI focuses on the related statement.
  • When a statement has associated annotations (here, two warnings), the focus is placed on the first annotation, instead of the statement itself. This does not imply that the annotation itself is related to this specific warning.
  • The property status indicators (colored circles, or bullets on the left of each property) display the consolidated status of all callstacks; in particular, if the property is definitively valid in one callstack, but possibly/definitively invalid in another, the GUI displays a yellow bullet.

Nonterm restores some of the information lost due to callstack consolidation. The highlighted warning in particular gives us the following information:

  1. There exists a stack trace in which statement h5 -= c5 << 25 does not terminate;
  2. There is exactly one stack trace in which the statement never terminates; all other stack traces (which are not shown in the warning) terminate.

Currently, it is not possible to select a stack trace from the Messages panel, but we can do so using the Values panel. If we switch to it (keeping the statement highlighted in the source code), we can see that there are 40 different stack traces reaching this point.

Values Panel

The Values panel is arguably the most powerful inspection tool for the EVA plug-in in the Frama-C GUI. Some of its features were presented in earlier posts, but for the sake of completeness, here are some commented screenshots:

Values panel

The values displayed in this panel are related to the green highlighted zone in the Cil source.

The Ctrl+E shortcut is equivalent to highlighting a statement, then right-clicking Evaluate ACSL term. The Ctrl+Shift+E shortcut is slightly more powerful: it also evaluates terms, such as \valid(p). This command is not available from any menus.

The Multiple selections checkbox allows adding several expressions to be compared side-by-side. When checked, highlighting an expression in the same statement adds a column with its value. Note that highlighting a different statement results in resetting all columns.

The three checkboxes to the right are seldom used: Expand rows simply expands all callstacks (but generates visual clutter); Consolidated value displays the row all (union of all callstacks); and Per callstack displays a row for each separate callstack.

The callstacks display has several contextual menus that can be accessed via right-clicks.

Callstacks display

Let us start from the bottom: right-clicking on a callstack shows a popup menu that allows you to focus on a given callstack. This focus modifies the display in the Cil code viewer: reachability will only be displayed for the focused callstack(s). We will come back to that later.

Right-clicking on a cell containing a value allows filtering on all callstacks for which the expression has the same value. This is often used, for instance, to focus on all callstacks in which a predicate evaluates to invalid or unknown.

Finally, clicking on the column headers allows filtering columns.

Note that the Callstacks column header displays a pipette icon when a filter is being applied, to remind you that other callstacks exist.

Filtering non-terminating callstacks

In our code, despite the existence of 40 callstacks, only one of them is non-terminating. If you highlight the 0 ≤ c5 expression before statement h5 -= c5 << 25, you will see that only a single callstack displays invalid in the column 0 ≤ c5. Focus on this callstack using the popup menu, then highlight expression c5 in the Cil code. You will obtain the following:

Focused on a non-terminating callstack

As you can see, the GUI now displays the statements following h5 -= c5 << 25 in red, indicating thay they are unreachable in the currently focused callstacks. The exact value that caused this is shown in column c5: -1. The C standard considers the left-shift of a negative number as undefined behavior. Because -1 is the only possible value in this callstack, the reduction caused by the alarm leads to a post-state that is <BOTTOM>.

Proceeding with the analysis

To allow EVA to continue the analysis of the code, we need to modify it in some way. Since we are not experts in cryptography, we are unable to provide a definitive explanation why the code was written this way. In any case, it is not specific to Monocypher, but also present in TweetNaCl and ref10, two cryptographic libraries.

It is likely that replacing the signed carry variables in function fe_mul with unsigned ones would get rid of the undefined behavior, without changing the expected behavior of the code. However, without a more formal analysis performed by a cryptographer, this is just guesswork. Still, we need to do something to be able to continue the analysis (and possibly spot more undefined behaviors), such as changing the declarations of variables c0 to c9 to u64 instead of i64. Then, re-parse the sources, re-run the analysis, and keep iterating.

Ideas for complex situations

In the beta version of this post, we were using version 0.1 of Monocypher, which had a different version of functions related to fe_mul. In particular, some of the functions were taken from TweetNaCl, and the code was not unrolled the same way as in Monocypher 0.3. One of the consequences was that Nonterm was unable to show as clear a picture as in this case; it was necessary to perform syntactic loop unrolling (e.g., using loop pragma UNROLL) just to be able to clearly see in the GUI which statement was non-terminating.

Future developments in Frama-C and in the EVA plug-in will help identifying and visualizing such situations more easily.

Acknowledgments

We would like to thank loup-vaillant (Monocypher's author) for the discussions concerning Monocypher and EVA's analysis. New versions of Monocypher have been released since the analysis performed for this series of posts, which do not present the undefined behavior described in this post.

Friday, March 17 2017

A simple EVA tutorial, part 2

On the previous post we've seen some recommendations about using Frama-C/EVA, and some tips about parsing. In this post, we will see how to run EVA, and how to quickly setup it for a more precise result.

We will reuse the save file produced at the end of the parsing, parsed.sav.

First run of EVA

The default parameters of EVA are intended for a fast analysis. In Frama-C 14 (Silicon), option -val-builtins-auto is recommended to enable the usage of built-in functions that improve the precision and sometimes the speed of the analysis1.

1 This option will be enabled by default in Frama-C 15 (Phosphorus).

The following command-line should result in a relatively quick first analysis:

frama-c -load parsed.sav -val-builtins-auto -val -save value.sav

Note that we save the result in another file. It can be reused as input for another analysis, or visualization in the GUI.

The analysis will likely output many alarms, some due to loss of precision, others due to an incorrect setup. Here are a few important alarms concerning an incorrect setup:

  1. Missing code or specification

    file.c:42:[kernel] warning: Neither code nor specification for function foo, generating default assigns from the prototype
    

    There are two major causes for this warning: (1) the file containing the function definition was not given to Frama-C during parsing; or (2) the function has no source code and no ACSL specification was given.

    In the first case, the solution is to include the missing source file. Parsing will succeed even if only a declaration (function prototype) is present, but EVA requires more than that. It may be necessary to return to the parsing stage when this arrives.

    In the second case, you must supply an ACSL specification for the function, otherwise EVA will assume it has no effect, which may be unsound. To do it with minimal modifications to the original code, you can do the following:

    1. create a file, say stubs.h;
    2. copy the function prototype to be stubbed in stubs.h (adding the necessary #includes for the types used in the prototype);
    3. add an ACSL specification to this prototype;
    4. include stubs.h in the original source, either by adding #include "stubs.h", or using GCC's -include option (e.g. -cpp-extra-args="-includestubs.h", without spaces between -include and the file name).
  2. Missing assigns clause, or missing \from

    When analyzing functions without source code, EVA imposes some constraints on the ACSL specification: they must contain assigns clauses, and these clauses must have \from dependencies. Otherwise, warnings such as the following may be generated:

    foo.c:1:[value] warning: no 'assigns \result \from ...' clause specified for function foo
    foo.c:3:[value] warning: no \from part for clause 'assigns *out;' of function foo
    foo.c:6:[kernel] warning: No code nor implicit assigns clause for function foo, generating default assigns from the prototype
    

    The following is an example of an incomplete specification:

    /*@ assigns *out; */
    void foo(int in, int *out);
    

    Even if it contains an \assigns clause for pointer out, it does not say where the result comes from. Adding \from in, for instance, makes the specification complete from the point of view of EVA.

    Note: EVA cannot verify the correctness of the specification in the absence of code, especially that of ensures clauses. If you provide an incorrect specification, the result may be unsound. For that reason, it is often useful to write a simplified (or abstract) implementation of the function and then run the analysis. If EVA has both the code and the specification, it is able to check ensures clauses and detect some kinds of errors.

Running EVA on monocypher

Running EVA on parsed.sav will start the value analysis on the main function defined in test.c. Due to the large number of small functions in Monocypher, EVA will output a huge amount of lines, whenever a new function is entered. Adding option -no-val-show-progress will omit messages emitted whenever entering a new function.

Also, the fact that this code contains lots of small functions with few or no side-effects is a very strong indicator that -memexec-all will be very helpful in the analysis.

Memexec, which is part of EVA, acts as a cache that allows reusing the result of function calls when their memory footprint is unchanged. It dramatically improves performance.

Combining both options, we obtain the following command-line:

frama-c -load parsed.sav -val -val-builtins-auto \
        -no-val-show-progress -memexec-all -save value.sav

The analysis will then start and emit several warnings (mainly due to imprecisions). It should finish in a few seconds, depending on your processor speed.

Improving EVA results

After running the value analysis, it is a good time to check what the result looks like, using the GUI:

frama-c-gui -load value.sav

In the screenshot below, we indicate some parts of the GUI that are useful when inspecting EVA results (besides the source view). We also indicate some parts that are never (or rarely) used with EVA.

Frama-C GUI for EVA

Note that the Properties tab (between Console and Values) is not updated automatically: you need to click on the Refresh button before it outputs anything, and after changing filters.

Several tips concerning this panel were presented in a previous post about the GUI. If you follow them, you will be able to make the Properties panel show the total of Unknown (unproven) properties for the entire program, and only those. This number is often similar to the number of messages in the Messages panel.

In Monocypher, after setting the filters to show every Unknown property in the entire program, and clicking Refresh, we obtain over 900 unproven properties. Since the analysis was not tuned at all for precision (other than with -val-builtins-auto), this number is not particularly surprising.

A quick way to improve on results is to use the Loop analysis plug-in.

The Loop analysis plug-in performs a mostly syntactic analysis to estimate loop bounds in the program (using heuristics, without any soundness guarantees) and outputs a list of options to be added to the value analysis. Running EVA again with these options should improve the precision, although it may increase analysis time. Loop analysis' main objective is to speed up the repetitive task of finding loop bounds and providing them as semantic unrolling (-slevel) counters. The analysis may miss some loops, and the estimated bounds may be larger or smaller, but overall it minimizes the amount of manual work required.

Loop analysis does not depend on EVA, but if it has been run, the results may be more precise. In Monocypher, both commands below give an equivalent result (the difference is not significative in this context):

frama-c -load parsed.sav -loop
frama-c -load value.sav -loop

In both cases, Loop analysis' effect is simply to produce a text output that should be fed into EVA for a new analysis:

[loop] Add this to your command line:
       -val-slevel-merge-after-loop crypto_argon2i \
       -val-slevel-merge-after-loop crypto_blake2b_final \
       ...

You should, by now, use a shell script or a Makefile to run the Frama-C command line, adding all the -val-slevel-merge-after-loop and -slevel-function lines to your command.

Let us consider that the environment variable LOOPFLAGS contains the result of Loop analysis, and EVAFLAGS contains the flags mentioned previously (-no-val-show-progress, -val-builtins-auto and -memexec-all). Then the following command will re-run EVA with a more detailed (and, hopefully, precise) set of parameters:

frama-c -load parsed.sav $LOOPFLAGS -val $EVAFLAGS -save value2.sav

Opening this file on the GUI will indicate approximately 500 warnings, which is still substantial, but much better than before. Improvements to Loop analysis in the next release of Frama-C will allow this number to be reduced slightly.

The basic tutorial is over, but there are several paths to choose

From here on, there are several possibilities to reduce the imprecisions in the analysis:

  • Inspect alarms and see if their functions contain loops that were not inferred by Loop analysis; if so, adding their bounds to -slevel-function can improve the precision of the analysis;

  • Increase the precision using other parameters, such as -plevel;

  • Stub libc functions to emulate/constrain inputs when relevant;

  • Use EVA's abstract domains (e.g. -eva-equality-domain) to improve precision;

  • Stop at the first few alarms (-val-stop-at-nth-alarm), to track more closely the sources of imprecision. However, when there are hundreds of alarms, this is more useful as a learning experience than as a practical solution.

Each solution is more appropriate in a specific situation. Here are a few tips for an intermediate-level user of EVA:

  1. Functions that perform array initialization are often simple (a loop with a few assignments per iteration), so unrolling them completely should not slow down the analysis excessively. The Loop analysis plug-in usually works with them, but some pattern variations may throw it off. You may want to check the proposed values in such loops. Because initialization happens early in program execution, checking such loops may yield good results.

  2. The plevel parameter is often used in response to messages such as:

    monocypher.c:491:[kernel] more than 200(255) elements to enumerate. Approximating.
    

    where the first number is the current plevel (by default, 200), and the second number is the amount that would be required to avoid the approximation. In this case, -plevel 255 would be reasonable, but if you had more than 200(67108864) elements, for instance, it would not be helpful to set the plevel to such a high value.

  3. Stubbing is a good approach when dealing with functions that are closely system-dependent, specifically input functions that read from files, sockets, or from the command-line. Check the Frama-C builtins in __fc_builtin.h, they provide some useful primitives for abstracting away code with non-deterministic functions.

  4. EVA's domains have specific trade-offs between precision and efficiency, and some have broader applicability than others. Future posts in this blog will describe some of these domains, but as a rule of thumb, two domains that are fairly low-cost and generally useful are -eva-equality-domain (for syntactic equalities) and -eva-gauges-domain (for some kinds of loops).

  5. The Messages panel in the GUI is chronologically sorted, so it can help the user follow what the analysis did, to try and identify sources of imprecision. However, even in this case, there is still an advantage to using -val-stop-at-nth-alarm: because the execution stops abruptly, there are possibly less callstacks displayed in the GUI, and therefore it may be easier to see at a glance which parts of the code were actually executed, and the dependencies between values that lead to the alarm.

Non-terminating function?

The "beginner" tutorial ends here, but one thing that you may have noticed after running EVA, is the dreaded "non terminating function" message at the end of the analysis:

[value:final-states] Values at end of function main:
  NON TERMINATING FUNCTION

This indicates that, somewhere during the analysis, a completely invalid state was found, and EVA could not proceed. This usually indicates one of the following:

  1. EVA's setup is incorrect: most likely, some function has missing or incorrect specifications, or some case that cannot be currently handled by EVA (e.g. recursive calls) was encountered.
  2. A definitively undefined behavior is present in the code, which may or may not lead to an actual bug during execution. In either case, it should be taken care of.

We will see how to handle such situations in the next post, using the GUI and the Nonterm plug-in (-nonterm), in a tutorial destined for beginners and experienced users alike.

Tuesday, March 7 2017

A simple EVA tutorial, part 1

(with the collaboration of T. Antignac, Q. Bouillaguet, F. Kirchner and B. Yakobowski)

This is the first of a series of posts on a new EVA tutorial primarily aimed at beginners (some of the later posts contain more advanced content).

Reminder: EVA is the new name of the Value analysis plug-in.

There is a Value tutorial on Skein-256 that is part of the Value Analysis user manual. The present tutorial is complementary and presents some new techniques available in Frama-C. If you intend to use EVA, we recommend you read the Skein-256 tutorial as well because it details several things that will not be repeated here. (However, it is not required to have read the Skein-256 before this one.)

The source code used in this tutorial is the version 0.3 of Monocypher, a C99-conformant crytographic library that also includes a nice test suite.

Note: newer versions of Monocypher are available! For this tutorial, please ensure you download version 0.3, otherwise you will not obtain the same behavior as described in this tutorial.

Future posts will include more advanced details, useful for non-beginners as well, so stay tuned!

Starting with Frama-C/EVA

This tutorial will use Monocypher's code, but it should help you to figure out how to analyze your code as well. First and foremost, it should help you answer these questions:

  1. Is my code suitable for a first analysis with Frama-C/EVA?
  2. How should I proceed?

(Un)Suitable code

There are lots of C code in the wild; for instance, searching Github for language:C results in more than 250k projects. However, many of them are not suitable candidates for a beginner, for reasons that will be detailed in the following.

Note that you can analyze several kinds of codes with Frama-C/EVA. However, without previous experience, some of them will raise many issues at the same time, which can be frustrating and inefficient.

Here is a list of kinds of code that a Frama-C/EVA beginner should avoid:

  1. Libraries without tests

    EVA is based on a whole-program analysis. It considers executions starting from a given entry point. Libraries without tests contain a multitude of entry points and no initial context for the analysis. Therefore, before analyzing them, you will have to write your own contexts1 or test cases.

  2. Command-line tools that rely heavily on the command-line (e.g. using getopt), without test cases.

    Similar to the previous situation: a program that receives all of its input from the command-line behaves like a library. Command-line parsers and tools with complex string manipulation are not the best use cases for the EVA's current implementation. A fuzzer might be a better tool in this case (though a fuzzer will only find bugs, not ensure their absence). Again, you will have to provide contexts1 or test cases.

  3. Code with lots of non-C99 code (e.g. assembly, compiler extensions)

    Frama-C is based on the C standard, and while it includes numerous extensions to handle GCC and MSVC-specific code, it is a primarily semantic-based tool. Inline assembly is supported syntactically, but its semantics needs to be given via ACSL annotations. Exotic compiler extensions are not always supported. For instance, trying to handle the Linux kernel without previous Frama-C experience is a daunting task.

  4. Code relying heavily on libraries (including the C standard library)

    Frama-C ships with an annotated standard library, which has ACSL specifications for many commonly-used functions (e.g. string.h and stdlib.h). This library is however incomplete and in some cases imprecise2. You will end up having to specify and refine several functions.

1 A context, here, is similar to a test case, but more general. It can contain, for instance, generalized variables (e.g. by using Frama_C_interval or ACSL specifications).

2 A balance is needed between conciseness (high-level view), expressivity, and precision (implementation-level details). The standard library shipped with Frama-C tries to be as generic as possible, but for specific case studies, specialized specifications can provide better results.

Each new version of Frama-C brings improvements concerning these aspects, but we currently recommend you try a more suitable code base at first. If your objective is to tackle such a challenging code base, contact us! Together we can handle such challenges more efficiently.

This explains why Monocypher is a good choice: it has test cases, it is self-contained (little usage of libc functions), and it is C99-conforming.

The 3-step method

In a nutshell, the tutorial consists in performing three steps:

  1. Parsing the code (adding stubs if necessary);
  2. Running EVA with mostly default parameters (for a first, approximated result);
  3. Tuning EVA and running it again.

The initial parsing is explained in this post, while the other steps will be detailed in future posts.

General recommendations

Before starting the use of Frama-C, we have some important general recommendations concerning the EVA plug-in:

  1. DO NOT start with the GUI. Use the command-line. You should consider Frama-C/EVA as a command-line tool with a viewer (the GUI). The Frama-C GUI is not an IDE (e.g. you cannot edit code with it), and EVA does not use the GUI for anything else other than rendering its results.

  2. Use scripts. Even a simple shell script, just to save the command-line options, is already enough for a start. For larger code bases, you will want Makefiles or other build tools to save time.

  3. Use frama-c -kernel-help (roughly equivalent to the Frama-C manpage) and frama-c -value-help to obtain information about the command-line options. Each option contains a brief description of what it does, so grepping the output for keywords (frama-c -kernel-help | grep debug for instance) is often useful. Otherwise, consider Stack Overflow - there is a growing base of questions and answers available there.

  4. Advance one step at a time. As you will see, the very first step is to parse the code, and nothing else. One does not simply run EVA, unless he or she is very lucky (or the program is very simple). Such precautions may seem excessive at first, but being methodical will save you time in the long run.

Parsing the code

Often overlooked, this step is erroneously considered as "too simple" ("just give all files to the command-line!"). In a few cases, it is indeed possible to run frama-c *.c -val and succeed in parsing everything and running EVA.

When this does not work, however, it is useful to isolate the steps to identify the error. Here are some general recommendations:

  1. Start with few files, and include more when needed

    Note that parsing may succeed even if some functions are only declared, but not defined. This will of course prevent EVA from analyzing them. If so, you may have to return to this step later, adding more files to be parsed.

  2. Ensure that preprocessor definitions and inclusions are correct

    Several code bases require the use of preprocessor definitions (-D in GCC) or directory inclusions (-I in GCC) in order for the code to be correctly preprocessed. Such information is often available in Makefiles, and can be given to Frama-C using e.g. -cpp-extra-args="-DFOO=bar -Iheaders".

    -cpp-extra-args is the most useful option concerning this step. It is used in almost every case study, and often the only required option for parsing. Note: old releases of Frama-C did not have this option, and -cpp-command was recommended instead. Nowadays, -cpp-command is rarely needed and should be avoided, because it is slightly more complex to use.

  3. Make stubs for missing standard library definitions

    Frama-C's standard library is incomplete, especially for system-dependent definitions that are not in C99 or in POSIX. Missing constants, for instance, may require the inclusion of stub files (e.g. stubs.h) with the definitions and/or the ACSL specifications. A common way to include such files is to use GCC's -include option, documented here.

  4. Save the result

    Use Frama-C's -save/-load options to avoid having to reparse the files each time. There is no default extension associated with Frama-C save files, although .sav is a common choice. For instance, running:

    frama-c <parse options> -save parsed.sav
    

    will try to parse the program and, if it succeeds, will save the Frama-C session to parsed.sav. You can then open it in the GUI (frama-c-gui -load parse.sav), to see what the normalized source code looks like, or use it as an input for the next step.

Reminder: for the EVA plug-in, the GUI is not recommended for parametrizing/tuning an analysis. It is best used as a viewer for the results.

The default output of EVA is rather verbose but very useful for studying small programs. For realistic case studies, however, you may want to consider the following options:

  • -no-val-show-progress: does not print when entering a new function. This will be the default in Frama-C 15 (Phosphorus);

  • -value-msg-key=-initial-state: does not print the initial state;

  • -value-msg-key=-final-states: does not print the final states of the analysis.

Note the minus symbols (-) before initial-state and final-states. They indicate we want to hide the messages conditioned by these categories.

Parsing monocypher

As indicated above, the naive approach (frama-c *.c) does not work with monocypher:

$ frama-c *.c

[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i (no preprocessing)
[kernel] Parsing monocypher.c (with preprocessing)
[kernel] Parsing more_speed.c (with preprocessing)
[kernel] syntax error at more_speed.c:15:
         13
         14    // Specialised squaring function, faster than general multiplication.
         15    sv fe_sq(fe h, const fe f)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
         16    {
         17        i32 f0 = f[0]; i32 f1 = f[1]; i32 f2 = f[2]; i32 f3 = f[3]; i32 f4 = f[4];

The first line is always printed when Frama-C parses a source file, and can be ignored.

The second line indicates that monocypher.c is being parsed.

The third line indicates that more_speed.c is now being parsed, implying that the parsing of monocypher.c ended without issues.

Finally, we have a parsing error in more_speed.c, line 15. That line, plus the lines above and below it, are printed in the console.

Indeed, the file more_speed.c is not a valid C source (sv is not a type defined in that file, and it does not include any other files). But this is not an actual issue, since more_speed.c is not part of the library itself, simply an extra file (this can be confirmed by looking into the makefile). Thus we are going to restrict the set of files Frama-C is asked to analyze.

Note: Frama-C requires the entire program to be parsed at once. It may be necessary to adapt compilation scripts to take that into account.

We also see that the rule for building monocypher.o includes a preprocessor definition, -DED25519_SHA512. We will add that to our parsing command, which will then become:

frama-c test.c sha512.c monocypher.c -cpp-extra-args="-DED25519_SHA512" -save parsed.sav

The lack of error messages is, in itself, a resounding success.

The first part of this tutorial ends here. See you next week!

For now, you can start reading the Skein-256 tutorial available at the beginning of the EVA manual. Otherwise, if you already know EVA (and still decided to read this), you may try to find some undefined behavior (UB) in Monocypher 0.3!

Hint: There is indeed some UB, although it does not impact the code in any meaningful way. At least not with today's compilers, maybe in the future... and anyway, it has been fixed in the newer releases of Monocypher.

Wednesday, October 12 2016

A mini ACSL tutorial for Value, part 3: indirect assigns

To conclude our 3-part series on ACSL specifications for Value, we present a feature introduced in Frama-C Aluminium that allows more precise specifications: the indirect label in ACSL assigns clauses. The expressivity gains when writing \froms are especially useful for plugins such as Value.

Indirect assigns

Starting in Frama-C Aluminium (20150601), assigns clauses (e.g. assigns x \from src) accept the keyword indirect (e.g. assigns x \from indirect:src), stating that the dependency from src to x is indirect, that is, it does not include data dependencies between src and x. In other words, src itself will never be directly assigned to x.

Indirect dependencies are, most commonly, control dependencies, in which src affects x by controlling whether some instruction will modify the value of x. Another kind of indirect dependency are address dependencies, related to the computation of addresses for pointer variables.

Let us once again refer to our running example, the specification and mock implementation of safe_get_random_char. As a reminder, here's its specification, without the ensures \subset(*out,...) postcondition, as suggested in the previous post:

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

Here's the mock implementation of the original function, in which ensures \subset(*out,...) holds:

#include "__fc_builtin.h"
#include <stdlib.h>
typedef enum { OK, NULL_PTR, INVALID_LEN } status;

status safe_get_random_char(char *out, char const *buf, unsigned n) {
  if (out == NULL || buf == NULL) return NULL_PTR;
  if (n == 0) return INVALID_LEN;
  *out = buf[Frama_C_interval(0,n-1)];
  return OK;
}

And here's the main function used in our tests:

void main() {
  char *msg = "abc";
  int len_arr = 4;
  status res;
  char c;
  res = safe_get_random_char(&c, msg, len_arr);
  //@ assert res == OK;
  res = safe_get_random_char(&c, NULL, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(NULL, msg, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(&c, msg, 0);
  //@ assert res == INVALID_LEN;
}

In the mock implementation, we see that out and buf are tested to see if they are equal to NULL, but their value itself (i.e., the address they point to) is never actually assigned to *out; only the characters inside buf may be assigned to it. Therefore, out and buf are both control dependencies (in that, they control whether *out = buf[Frama_C_interval(0,n-1)] is executed), and thus indirect as per our definition.

n is also a control dependency of the assignment to *out, due to the check if (n == 0). n also appears in buf[Frama_C_interval(0,n-1)], leading this time to an address dependency: in lval = *(buf + Frama_C_interval(0,n-1)), lval depends indirectly on every variable used to compute the address that will be dereferenced (buf, n, and every variable used by Frama_C_interval in this case).

If we run Value using our specification, this is the result:

[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  msg ∈ {{ "abc" }}
  len_arr ∈ {4}
  res ∈ {2}
  c ∈
   {{ garbled mix of &{c; "abc"}
    (origin: Arithmetic {file.c:33}) }}

Note that c has a garbled mix which includes c itself, plus the string literal "abc". The culprit is this assigns clause:

assigns *out \from out, buf, buf[0 .. n-1], n;

out is c and buf is "abc". n, despite also being a dependency, does not contribute to the garbled mix because it is a scalar. The garbled mix appears because, in some functions, it is the address of the pointer itself that is assigned to the lvalue in the assigns clause. Without a means of distinguishing between direct and indirect dependencies, one (dangerous) workaround is to omit some dependencies from the clauses. This may lead to incorrect results.

Thanks to indirect \from clauses, now we can avoid the garbled mix by specifying that out and buf are only indirect dependencies. Applying the same principle to all assigns clauses, we obtain the final version of our (fixed) specification:

/*@
  assigns \result \from indirect:out, indirect:buf, indirect:n;
  assigns *out \from indirect:out, indirect:buf, buf[0 .. n-1], indirect:n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from indirect:out, indirect:buf, indirect:n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from indirect:out, indirect:buf, indirect:n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

Note that indirect dependencies are implied by direct ones, so they never need to be added twice.

With this specification, Value will return c ∈ [--..--], without garbled mix. The result is still imprecise due to the lack of ensures, but better than before. Especially when trying Value on a new code base (where most functions have no stubs, or only simple ones), the difference between garbled mix and [--..--] (often called top int, that is, the top value of the lattice of integer ranges) is significant.

In the new specification, it is arguably easier to read the dependencies and to reason about them: users can skip the indirect dependencies when reasoning about the propagation of imprecise pointer values.

The new specification is more verbose because indirect is not the default. And this is so in order to avoid changing the semantics of existing specifications, which might become unsound.

The From plugin has a new (experimental) option, --show-indirect-deps, which displays the computed dependencies using the new syntax. It is considered experimental simply because it has not yet been extensively used in industrial applications, but it should work fine. Do not hesitate to tell us if you have issues with it.

Ambiguously direct dependencies

It is not always entirely obvious whether a given dependency can be safely considered as indirect, or if it should be defined as direct. This is often the case when a function has an output argument that is related to the length (or size, cardinality, etc.) of one of its inputs. strnlen(s, n) is an example of a libc function with that property: it returns n itself when s is longer than n characters.

Let us consider the following function, which searches for a character in an array and returns its offset, or the given length if not found:

// returns the index of c in buf[0 .. n-1], or n if not found
/*@ assigns \result \from indirect:c, indirect:buf,
                          indirect:buf[0 .. n-1], indirect:n; */
int indexOf(char c, char const *buf, unsigned n);

Our specification seems fine: the result value is usually the number of loop iterations, and therefore it depends indirectly on the related arguments.

However, the following implementation contradicts it:

int indexOf(char c, char const *buf, unsigned n) {
  unsigned i;
  for (i = 0; i < n; i++) {
    if (buf[i] == c) return i;
  }
  return n;
}

void main() {
  int i1 = indexOf('a', "abc", 3);
  int i2 = indexOf('z', "abc", 3);
}

If we run frama-c -calldeps -show-indirect-deps (that is, run the From plugin with callwise dependencies, showing indirect dependencies) in this example, we will obtain this output:

[from] ====== DISPLAYING CALLWISE DEPENDENCIES ======
[from] call to indexOf at ambiguous.c:10 (by main):
  \result FROM indirect: c; buf; n; "abc"[bits 0 to 7]
[from] call to indexOf at ambiguous.c:11 (by main):
  \result FROM indirect: c; buf; n; "abc"[bits 0 to 23]; direct: n
[from] entry point:
  NO EFFECTS
[from] ====== END OF CALLWISE DEPENDENCIES ======

Note that, in the second call, n is computed as a direct dependency. Indeed, it is directly assigned to the return value in the code. This means that our specification is possibly unsound, since it states that n is at most an indirect dependency of \result.

However, if we modify our implementation of indexOf to return i instead of n, then From will compute n as an indirect dependency, and thus our specification could be considered correct. The conclusion is that, in some situations, both versions can be considered correct, and this not will affect the end result.

One specific case where such discussions may be relevant is the case of the memcmp function of the C standard library (specified in string.h): one common implementation consists in comparing each byte of both arrays, say s1[i] and s2[i], and returning s1[i] - s2[i] if they are different. One could argue that such an implementation would imply that assigns \result \from s1[0 ..], s2[0 ..], with direct dependencies. However, this can create undesirable garbled mix, so a better approach would be to consider them as indirect dependencies. In such situations, the best specification is not a clear-cut decision.

Frama-C libc being updated

The Frama-C stdlib has lots of specifications that still need to be updated to take indirect into account. This is being done over time, which means that unfortunately they do not yet constitute a good example of best practices. This is improving with each release, and soon they should offer good examples for several C standard library functions. Until then, you may refer to this tutorial or ask the Frama-C community for recommendations to your specifications.

Also note that some of these recommendations may not be the most relevant ones when considering other plugins, such as WP. Still, most tips here are sufficiently general that they should help you improve your ACSL for all purposes.

Friday, September 30 2016

A mini ACSL tutorial for Value, part 2: functional dependencies

In our previous post, we left you in a cliffhanger: which \from is missing from our ACSL specification for safe_get_random_char? In this post, we explain the functional dependencies in our specification, how to test them, and then present the missing dependency.

Where do the \from come from?

Our complete specification for safe_get_random_char in the previous post includes several functional dependencies. Some of them are obvious, others not so much. For easier reference, here is the specification once again:

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
    ensures \subset(*out, buf[0 .. n-1]);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

As a reminder, assigns a \from b1, b2, ... specifies that there are control and/or data dependencies from b1, b2, ... to a. Data dependencies are direct or indirect assignments (e.g. a = b1 or c = b1; a = c), and control dependencies are related to control-flow (e.g. if (b1 && b2) { a = 1; } else { a = 2; }). Value uses them for several purposes, such as:

  • precision: to define the possible origins of pointers, otherwise imprecise values will degenerate into "modifies anything";
  • efficiency: to avoid recomputing the state during a leaf function call, if the only difference is in the values of variables that no one depends on.

Value always requires that assigns clauses contain \from dependencies. Since \from are an overapproximation of the actual dependencies, in case of doubt the safe bet is to include more than the necessary. Too many \from, however, may lead to imprecise analyses.

One way to consider whether a given location should be part of a \from is to think in terms of variation: if the actual value of the location changed, would it possibly affect the assigned lvalue? If so, then there is a dependency.

Let us revisit our first get_random_char function, in particular its assigns clause:

//@ assigns \result \from buf[0 .. n-1], n;
char get_random_char(char const *buf, unsigned n);

If we change the value of any character between buf[0] and buf[n-1], this can have a direct impact on the result. Changing the value of n obviously also has an effect: there are more characters to choose from.

However, one thing that does not affect the result is the address of buf itself, that is, the contents of the buf pointer: if the buffer starts on address 0x42, or on address 0xfffe1415, the result of the function is the same, as long as the precondition is valid (i.e., the memory location is readable) and the contents of the buffer are the same.

But why, then, do we have buf (the pointer, not the pointed value) as part of the \from in safe_get_random_char (copied below)?

//@ assigns \result \from out, buf, n;
status safe_get_random_char(char *out, char const *buf, unsigned n);

Note that out is also present as a dependency, and for the same reason: the NULL pointer. In our specification, we included assumes clauses such as out == \null || buf == \null. We may expect, therefore, that our function will be able to distinguish whether any of these pointers is null, very likely via a test such as the following:

...
if (out == NULL || buf == NULL) { return INVALID_LEN; }
...

Such a test introduces control dependencies between out, buf and \result: if out is 0, the result is different than if out is nonzero (i.e. non-null). For that reason, out and buf must be included as dependencies. Conversely, note that the contents of the buffer itself do not affect \result.

When in doubt, try some code

A good way to check a specification is (when possible) to write an abstracted code of the function under test, then run Value (and sometimes From) and see if it matches the expectations.

For safe_get_random_char, a few lines of code suffice to obtain a roughly equivalent version of our specification:

#include "__fc_builtin.h"
#include <stdlib.h>
typedef enum { OK, NULL_PTR, INVALID_LEN } status;

status safe_get_random_char(char *out, char const *buf, unsigned n) {
  if (out == NULL || buf == NULL) return NULL_PTR;
  if (n == 0) return INVALID_LEN;
  *out = buf[Frama_C_interval(0,n-1)];
  return OK;
}

Note the usage of the built-in Frama_C_interval (included via __fc_builtin.h) to simulate a random value between 0 and n-1 (inclusive).

Now we need a main function to invoke our code. We will reuse the one we defined before, since its calls activate all branches in our code. Re-running Value, this time using the code plus the specification, will prove all pre- and post-conditions, except the one related to \subset(*out, buf[0 .. n-1]). This is just an imprecision in Value that may disappear in the future.

We can also try running the From plugin (frama-c -deps), without our specification, to see if the dependencies it infers are compatible with the ones we specified in our assigns clauses.

For function safe_get_random_char, From will infer the following dependencies related to \result and *out:

  ...
  a FROM Frama_C_entropy_source; out; buf; n; "abc" (and SELF)
  \result FROM out; buf; n
  ...

Frama_C_entropy_source comes from the usage of Frama_C_interval. For now, let us focus on the other variables:

  • a is actually *out, and "abc" is actually buf[0 .. n-1]. The result of the From plugin is very concrete, so we have to abstract a few variable names.
  • the SELF dependence related to *out simply means that the variable may not be assigned in the function (e.g. when an error is found). This information is not present in assigns clauses. In particular, for Value it could be useful to have a "must assign" clause, describing an underapproximation of assignments that are certain to happen in every execution. This is not present in ACSL, but it can often be compensated with the use of behaviors with refined assigns and ensures clauses, as we did.
  • all dependencies, both for *out and \result, are the ones we had predicted before, except for Frama_C_entropy_source.

Variables such as Frama_C_entropy_source are often used to represent internal states of elements that are abstracted away in specifications. In this case, it can be seen as the internal state of the (pseudo-)random number generator that allows the result of our safe_get_random_char to be non-deterministic. Without it, we would have a function that could be assumed to return the same value every time that the same input parameters were given to it.

Thus, to really model a randomized result, we need to add such a variable to our specification. It is important not to forget to also assign to the variable itself, to represent the fact that its own internal state changed during the call. In our specification, it would be sufficient to add the following line to the global assigns clause:

  assigns Frama_C_entropy_source \from Frama_C_entropy_source;

With this final assigns clause, our specification is indeed complete. This concludes our tutorial on ACSL specifications oriented towards an analysis with Value!

A minor variant, an unexpected result

Let us consider a minor variant of our specification: what if we remove the postcondition ensures \subset(*out, buf[0 .. n-1])? We would expect Value to return [--..--], since *out is said to be assigned by the function, but we did not specify which values it may contain. However, this is what we obtain instead, after the call to safe_get_random_char(&c, msg, len_msg):

  c ∈
   {{ garbled mix of &{c; "abc"} (origin: Arithmetic {file.c:44}) }}

This undesirable garbled mix indicates that some pointer information leaked to the contents of the variable c, more precisely, that the address of c (out) itself could be contained in c. We know this is not the case, since in every reasonable implementation of safe_get_random_char, the address of out is never directly assigned to *out. However, until Frama-C Magnesium, there was no way to specify the distinction between dependencies that may impact the actual contents of an lvalue from dependencies which are only indirectly related to its value (e.g. control dependencies, such as testing whether out is NULL or not).

Because of that limitation, there were typically two ways to deal with that: omit dependencies (dangerous!) or accept the introduction of garbled mix. Frama-C Aluminium brought a new way to handle them that solves both issues. This will be the subject of the next post in this blog. Stay tuned!

Friday, September 23 2016

A mini-tutorial of ACSL specifications for Value

(with the collaboration of F. Kirchner, V. Prevosto and B. Yakobowski)

Users of the Value plugin often need to use functions for which there is no available code, or whose code could be abstracted away. In such cases, ACSL specifications often come in handy. Our colleagues at Fraunhofer prepared the excellent ACSL by example report, but it is mostly directed at WP-style proofs.

In this post we explain how to specify in ACSL a simple function, in a way that is optimal for Value.

Prerequisites:

  • Basic knowledge of Value;
  • Basic knowledge of ACSL.

The messages and behaviors presented here are those of Frama-C Aluminium. Using another version of Frama-C might lead to different results.

A simple function

In this tutorial, we proceed in a gradual manner to isolate problems and make sure that each step works as intended. Let us consider the following informal specification:

// returns a random character between buf[0] and buf[n-1]
char get_random_char(char const *buf, unsigned n);

A minimal ACSL specification for this function must check two things:

  1. that n is strictly positive: otherwise, we'd have to select a character among an empty set, thereby killing any logician that would wander around at that time (buf is a byte array that may contain \0, so that there is no obvious way to return a default value in case n == 0);
  2. that we are allowed (legally, as per the C standard) to read characters between buf[0] and buf[n-1];

The following specification ensures both:

/*@
  requires n > 0;
  requires \valid_read(buf+(0 .. n-1));
*/
char get_random_char(char const *buf, unsigned n);

Note the spaces between the bounds in 0 .. n-1. A common issue, when beginning to write ACSL specifications, is to write ranges such as 0..n-1 without spaces around the dots. It works in most cases, but as 0..n is a floating-point preprocessing token (yes, this looks strange, but you can see it yourself in section 6.4.8 of C11 standard if you're not convinced) funny things might happen when pre-processing the annotation. For instance, if we had a macro MAX instead of n, (e.g. #define MAX 255), then writing [0..MAX] would result in the following error message: [kernel] user error: unbound logic variable MAX, since the pre-processor would dutifully consider 0..MAX as a single token, thus would not perform the expansion of MAX. For that reason, we recommend always writing ranges with spaces around the dots.

To check our specification, we devise a main function that simply calls get_random_char with the appropriate arguments:

void main() {
  char *buf = "abc";
  char c = get_random_char(buf, 3);
}

Then, if we run this with Value (frama-c -val), it will produce the expected result, but with a warning:

[kernel] warning: No code nor implicit assigns clause for function get_random_char, generating default assigns from the prototype

By printing the output of Frama-C's normalisation (frama-c -print), we can see what its generated assigns clause looks like:

assigns \result;
assigns \result \from *(buf+(0 ..)), n;

Despite the warning, Frama-C kernel generated a clause that is correct and not too imprecise in this case. But we should not rely on that and avoid this warning whenever possible, by writing our own assigns clauses:

/*@
  requires n > 0;
  requires \valid_read(buf+(0 .. n-1));
  assigns \result \from buf[0 .. n-1], n;
*/
char get_random_char(char const *buf, unsigned n);

Running Value again will not emit any warnings, plus it will indicate that both preconditions were validated:

[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i (no preprocessing)
[kernel] Parsing file.c (with preprocessing)
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] Values of globals at initialization

[value] computing for function get_random_char <- main.
        Called from file.c:10.
[value] using specification for function get_random_char
file.c:2:[value] function get_random_char: precondition got status valid.
file.c:3:[value] function get_random_char: precondition got status valid.
[value] Done for function get_random_char
[value] Recording results for main
[value] done for function main
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  buf ∈ {{ "abc" }}
  c ∈ [--..--]

However, the result is still imprecise: c ∈ [--..--], that is, it may contain any character, not only 'a', 'b' or 'c'. This is expected: assigns \from ... clauses are used by Value to compute dependencies information, but they are not sufficient to constrain the exact contents of the assigned variables. We need to use postconditions for that, that is, ensures clauses.

The following ensures clause states that the result value must be one of the characters in buf[0 .. n-1]:

ensures \subset(\result, buf[0 .. n-1]);

The ACSL \subset predicate is interpreted by Value, which isolates the characters in buf, and thus returns a precise result: c ∈ {97; 98; 99}. Note that Value prints the result as integers, not as ASCII characters.

This concludes the specification of a very simple partial function. But what if we want to take into account more possibilities, e.g. a NULL pointer in buf, or n == 0?

First try: a robust and concise specification

Our get_random_char function is simple, but not robust. It is vulnerable to accidents and attacks. Let us apply some Postel-style recommendations and be liberal in what we accept from others: instead of expecting that the user will not pass a NULL pointer or n == 0, we want to accept these cases, but return an error code if they happen.

We have two choices: add an extra argument to get_random_char that represents the result status (OK/error), or use the return value as the status itself, and return the character via a pointer. Here, we will adopt the latter approach:

typedef enum { OK, NULL_PTR, INVALID_LEN } status;

// if buf != NULL and n > 0, copies a random character from buf[0 .. n-1]
// into *out and returns OK.
// Otherwise, returns NULL_PTR if buf == NULL, or INVALID_LEN if n == 0.
status safe_get_random_char(char *out, char const *buf, unsigned n);

We could also have used errno here, but global variables are inelegant and, most importantly, it would defeat the purpose of this tutorial, wouldn't it?

This new function is more robust to user input, but it has a price: its specification will require more sophistication.

Before revealing the full specification, let us try a first, somewhat naive approach:

typedef enum { OK, NULL_PTR, INVALID_LEN } status;

/*@
  assigns \result, *out \from buf[0 .. n-1], n;
  ensures \subset(*out, buf[0 .. n-1]);
*/
status safe_get_random_char(char *out, char const *buf, unsigned n);

void main() {
  char *buf = "abc";
  char c;
  status res = safe_get_random_char(&c, buf, 3);
  //@ assert res == OK;
}

This specification has several errors, but we will try it anyway. We will reveal the errors as we progress.

Once again, we use a small C main function to check our specification, starting with the non-erroneous case. This reveals that Value does not return the expected result:

...
[value] Values at end of function main:
  buf ∈ {{ "abc" }}
  c ∈ [--..--] or UNINITIALIZED
  res ∈ {0}

Variable c is imprecise, despite our \ensures clause. This is due to the fact that Value will not reduce the contents of a memory location that may be uninitialized. Thus, when specifying the ensures clause of a pointed variable, it is first necessary to state that the value of that variable is properly initialized.

This behavior may evolve in future Frama-C releases. In particular, our specification could have resulted in a more precise result, such as c ∈ {97; 98; 99} or UNINITIALIZED.

Adding ensures \initialized(out); before the \ensures \subset(...) clause will allow c to be precisely reduced to {97; 98; 99}. This solves our immediate problem, but creates another: is the specification too strong? Could we have an implementation of get_random_char in which \initialized(out) does not hold?

The answer here is definitely yes, especially in the case where buf is NULL. It is arguably also the case when n == 0, although it depends on the implementation.

The most precise and correct result that we expect here is c ∈ {97; 98; 99} or UNINITIALIZED. To obtain it, we will use behaviors.

Another reason to consider using behaviors is the fact that our \assigns clause is too generic, leading to avoidable warnings by Value: because we have written that *out may be assigned, Value will try to evaluate if out is a valid pointer. If our main function includes a test such as res = safe_get_random_char(NULL, buf, 3), Value will output the following:

[value] warning: Completely invalid destination for assigns clause *out. Ignoring.

This warning is not needed here, but its presence is justified by the fact that, in many cases, it helps detect incorrect specifications, such as an extra *. Value will still accept our specification, but if we want a precise analysis, the best solution is to use behaviors to specify each case.

Second try: using behaviors

The behaviors of our function correspond to each of the three cases of enum status:

  1. NULL_PTR when either out or buf are NULL;
  2. INVALID_LEN when out and buf are not NULL but n == 0;
  3. OK otherwise.

ACSL behaviors can be named, improving readability and generating more precise error messages. We will define a set of complete and disjoint behaviors; this is not required in ACSL, but for simple functions it often matches more closely what the code would actually do, and is simpler to reason about.

One important remark is that disjoint and complete behaviors are not checked by Value. The value analysis does take them into account to improve its precision when possible, but if they are incorrectly specified, Value may not be able to warn the user about it.

Here is one way to specify three disjoint behaviors for this function, using mutually exclusive assumes clauses.

/*@
  behavior ok:
    assumes out != null && buf != \null;
    assumes n > 0;
    requires \valid_read(buf+(0 .. n-1));
    requires \valid(out);
    // TODO: assigns and ensures clauses
  behavior null_ptr:
    assumes out == \null || buf == \null;
    // TODO: assigns and ensures clauses
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    // TODO: assigns and ensures clauses
*/
status safe_get_random_char(char *out, char const *buf, unsigned n);

Our choice of behaviors is not unique: we could have defined, for instance, two behaviors for null pointers, e.g. null_out and null_buf; but the return value is the same for both, so this would not improve precision.

Note that assumes and requires play different roles in ACSL: assumes are used to determine which behavior(s) are active, while requires impose constraints on the pre-state of the function or current behavior. For instance, one should not specify assumes \valid(out) in one behavior and assumes !\valid(out) in another: what this specification would actually mean is that the corresponding C function should somehow be able to distinguish between locations where it can write and locations where it cannot, as if one could write if (valid_pointer(buf)) in the C code. In practical terms, the main consequence is that such a specification would prevent Value from detecting errors related to memory validity.

For a more concrete example of the difference between them, you may consult the small appendix at the end of this post.

Assigns clauses in behaviors

Our specification for safe_get_random_char is incomplete: it has no assigns or ensures clauses.

Writing assigns clauses in behaviors is not always trivial, so here is a small, simplified summary of the main rules concerning the specification of such clauses for an analysis with Value:

  1. Global (default) assigns must always be present (even for complete behaviors), and must be at least as general as the assigns clauses in each behavior;
  2. Behaviors only need assigns clauses when they are more specific than the global one.
  3. Having a complete behaviors clause allows the global behavior to be ignored during the evaluation of the post-state, which may lead to a more precise result; but the global assigns must still be present.

Note: behavior inclusion is not currently checked by Value. Therefore, if a behavior's assigns clause is not included in the default one, the result is undefined.

Also, the reason why assigns specifications are verbose and partially redundant is in part because not every Frama-C plugin is able to precisely handle behaviors. They use the global assigns in this case.

For ensures clauses, the situation is simpler: global and local ensures clauses are simply merged with an implicit logical AND between them.

The following specification is a complete example of the usage of behaviors, with precise ensures clauses for both outputs (\result and out). The main function below tests each use case, and running Value results in valid statuses for all preconditions and assertions. The \from terms in the assigns clauses are detailed further below.

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
    ensures \subset(*out, buf[0 .. n-1]);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

void main() {
  char *msg = "abc";
  int len_arr = 4;
  status res;
  char c;
  res = safe_get_random_char(&c, msg, len_arr);
  //@ assert res == OK;
  res = safe_get_random_char(&c, NULL, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(NULL, msg, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(&c, msg, 0);
  //@ assert res == INVALID_LEN;
}

Our specification includes several functional dependencies (\from), but there is still one missing. Can you guess which one? The answer, as well as more details about why and how to write functional dependencies, will appear in the next post. Stay tuned!

Appendix: example of the difference between requires and assumes clauses

This appendix presents a small concrete example of what happens when the user mistakingly uses requires clauses instead of assumes. It is directed towards beginners in ACSL.

Consider the (incorrect) example below, where bzero_char simply writes 0 to the byte pointed by the argument c:

/*@
  assigns *c \from c;
  behavior ok:
    assumes \valid(c);
    ensures *c == 0;
  behavior invalid:
    assumes !\valid(c);
    assigns \nothing;
  complete behaviors;
  disjoint behaviors;
*/
void bzero_char(char *c);

void main() {
  char *c = "abc";
  bzero_char(c);
}

In this example, Value will evaluate the validity of the pointer c, and conclude that, because it comes from a string literal, it may not be written to (therefore \valid(c) is false). Value then does nothing and returns, without any warnings or error messages.

However, had we written it the recommended way, the result would be more useful:

/*@
  assigns *c \from c;
  behavior ok:
    assumes c != \null;
    requires \valid(c);
    ensures *c == 0;
  behavior invalid:
    assumes c == \null;
    assigns \nothing;
  complete behaviors;
  disjoint behaviors;
*/
void bzero_char(char *c);

void main() {
  char *c = "abc";
  bzero_char(c);
}

In this case, Value will evaluate c as non-null, and will therefore activate behavior ok. This behavior has a requires clause, therefore Value will check that memory location c can be written to. Because this is not the case, Value will emit an alarm:

[value] warning: function bzero_char, behavior ok: precondition got status invalid.

Note that checking whether c is null or non-null is something that can be done in the C code, while checking whether a given pointer p is a valid memory location is not. As a rule of thumb, conditions in the code correspond to assumes clauses in behaviors, while requires clauses correspond to semantic properties, function prerequisites that cannot necessarily be tested by the implementation.