Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Friday, March 17 2017

A simple EVA tutorial, part 2

On the previous post we've seen some recommendations about using Frama-C/EVA, and some tips about parsing. In this post, we will see how to run EVA, and how to quickly setup it for a more precise result.

We will reuse the save file produced at the end of the parsing, parsed.sav.

First run of EVA

The default parameters of EVA are intended for a fast analysis. In Frama-C 14 (Silicon), option -val-builtins-auto is recommended to enable the usage of built-in functions that improve the precision and sometimes the speed of the analysis1.

1 This option will be enabled by default in a future release.

The following command-line should result in a relatively quick first analysis:

frama-c -load parsed.sav -val-builtins-auto -val -save value.sav

Note that we save the result in another file. It can be reused as input for another analysis, or visualization in the GUI.

The analysis will likely output many alarms, some due to loss of precision, others due to an incorrect setup. Here are a few important alarms concerning an incorrect setup:

  1. Missing code or specification

    file.c:42:[kernel] warning: Neither code nor specification for function foo, generating default assigns from the prototype
    

    There are two major causes for this warning: (1) the file containing the function definition was not given to Frama-C during parsing; or (2) the function has no source code and no ACSL specification was given.

    In the first case, the solution is to include the missing source file. Parsing will succeed even if only a declaration (function prototype) is present, but EVA requires more than that. It may be necessary to return to the parsing stage when this arrives.

    In the second case, you must supply an ACSL specification for the function, otherwise EVA will assume it has no effect, which may be unsound. To do it with minimal modifications to the original code, you can do the following:

    1. create a file, say stubs.h;
    2. copy the function prototype to be stubbed in stubs.h (adding the necessary #includes for the types used in the prototype);
    3. add an ACSL specification to this prototype;
    4. include stubs.h in the original source, either by adding #include "stubs.h", or using GCC's -include option (e.g. -cpp-extra-args="-includestubs.h", without spaces between -include and the file name).
  2. Missing assigns clause, or missing \from

    When analyzing functions without source code, EVA imposes some constraints on the ACSL specification: they must contain assigns clauses, and these clauses must have \from dependencies. Otherwise, warnings such as the following may be generated:

    foo.c:1:[value] warning: no 'assigns \result \from ...' clause specified for function foo
    foo.c:3:[value] warning: no \from part for clause 'assigns *out;' of function foo
    foo.c:6:[kernel] warning: No code nor implicit assigns clause for function foo, generating default assigns from the prototype
    

    The following is an example of an incomplete specification:

    /*@ assigns *out; */
    void foo(int in, int *out);
    

    Even if it contains an \assigns clause for pointer out, it does not say where the result comes from. Adding \from in, for instance, makes the specification complete from the point of view of EVA.

    Note: EVA cannot verify the correctness of the specification in the absence of code, especially that of ensures clauses. If you provide an incorrect specification, the result may be unsound. For that reason, it is often useful to write a simplified (or abstract) implementation of the function and then run the analysis. If EVA has both the code and the specification, it is able to check ensures clauses and detect some kinds of errors.

Running EVA on monocypher

Running EVA on parsed.sav will start the value analysis on the main function defined in test.c. Due to the large number of small functions in Monocypher, EVA will output a huge amount of lines, whenever a new function is entered. Adding option -no-val-show-progress will omit messages emitted whenever entering a new function.

Also, the fact that this code contains lots of small functions with few or no side-effects is a very strong indicator that -memexec-all will be very helpful in the analysis.

Memexec, which is part of EVA, acts as a cache that allows reusing the result of function calls when their memory footprint is unchanged. It dramatically improves performance.

Combining both options, we obtain the following command-line:

frama-c -load parsed.sav -val -val-builtins-auto \
        -no-val-show-progress -memexec-all -save value.sav

The analysis will then start and emit several warnings (mainly due to imprecisions). It should finish in a few seconds, depending on your processor speed.

Improving EVA results

After running the value analysis, it is a good time to check what the result looks like, using the GUI:

frama-c-gui -load value.sav

In the screenshot below, we indicate some parts of the GUI that are useful when inspecting EVA results (besides the source view). We also indicate some parts that are never (or rarely) used with EVA.

Frama-C GUI for EVA

Note that the Properties tab (between Console and Values) is not updated automatically: you need to click on the Refresh button before it outputs anything, and after changing filters.

Several tips concerning this panel were presented in a previous post about the GUI. If you follow them, you will be able to make the Properties panel show the total of Unknown (unproven) properties for the entire program, and only those. This number is often similar to the number of messages in the Messages panel.

In Monocypher, after setting the filters to show every Unknown property in the entire program, and clicking Refresh, we obtain over 900 unproven properties. Since the analysis was not tuned at all for precision (other than with -val-builtins-auto), this number is not particularly surprising.

A quick way to improve on results is to use the Loop analysis plug-in.

The Loop analysis plug-in performs a mostly syntactic analysis to estimate loop bounds in the program (using heuristics, without any soundness guarantees) and outputs a list of options to be added to the value analysis. Running EVA again with these options should improve the precision, although it may increase analysis time. Loop analysis' main objective is to speed up the repetitive task of finding loop bounds and providing them as semantic unrolling (-slevel) counters. The analysis may miss some loops, and the estimated bounds may be larger or smaller, but overall it minimizes the amount of manual work required.

Loop analysis does not depend on EVA, but if it has been run, the results may be more precise. In Monocypher, both commands below give an equivalent result (the difference is not significative in this context):

frama-c -load parsed.sav -loop
frama-c -load value.sav -loop

In both cases, Loop analysis' effect is simply to produce a text output that should be fed into EVA for a new analysis:

[loop] Add this to your command line:
       -val-slevel-merge-after-loop crypto_argon2i \
       -val-slevel-merge-after-loop crypto_blake2b_final \
       ...

You should, by now, use a shell script or a Makefile to run the Frama-C command line, adding all the -val-slevel-merge-after-loop and -slevel-function lines to your command.

Let us consider that the environment variable LOOPFLAGS contains the result of Loop analysis, and EVAFLAGS contains the flags mentioned previously (-no-val-show-progress, -val-builtins-auto and -memexec-all). Then the following command will re-run EVA with a more detailed (and, hopefully, precise) set of parameters:

frama-c -load parsed.sav $LOOPFLAGS $EVAFLAGS -save value2.sav

Opening this file on the GUI will indicate approximately 500 warnings, which is still substantial, but much better than before. Improvements to Loop analysis in the next release of Frama-C will allow this number to be reduced slightly.

The basic tutorial is over, but there are several paths to choose

From here on, there are several possibilities to reduce the imprecisions in the analysis:

  • Inspect alarms and see if their functions contain loops that were not inferred by Loop analysis; if so, adding their bounds to -slevel-function can improve the precision of the analysis;

  • Increase the precision using other parameters, such as -plevel;

  • Stub libc functions to emulate/constrain inputs when relevant;

  • Use EVA's abstract domains (e.g. -eva-equality-domain) to improve precision;

  • Stop at the first few alarms (-val-stop-at-nth-alarm), to track more closely the sources of imprecision. However, when there are hundreds of alarms, this is more useful as a learning experience than as a practical solution.

Each solution is more appropriate in a specific situation. Here are a few tips for an intermediate-level user of EVA:

  1. Functions that perform array initialization are often simple (a loop with a few assignments per iteration), so unrolling them completely should not slow down the analysis excessively. The Loop analysis plug-in usually works with them, but some pattern variations may throw it off. You may want to check the proposed values in such loops. Because initialization happens early in program execution, checking such loops may yield good results.

  2. The plevel parameter is often used in response to messages such as:

    monocypher.c:491:[kernel] more than 200(255) elements to enumerate. Approximating.
    

    where the first number is the current plevel (by default, 200), and the second number is the amount that would be required to avoid the approximation. In this case, -plevel 255 would be reasonable, but if you had more than 200(67108864) elements, for instance, it would not be helpful to set the plevel to such a high value.

  3. Stubbing is a good approach when dealing with functions that are closely system-dependent, specifically input functions that read from files, sockets, or from the command-line. Check the Frama-C builtins in __fc_builtin.h, they provide some useful primitives for abstracting away code with non-deterministic functions.

  4. EVA's domains have specific trade-offs between precision and efficiency, and some have broader applicability than others. Future posts in this blog will describe some of these domains, but as a rule of thumb, two domains that are fairly low-cost and generally useful are -eva-equality-domain (for syntactic equalities) and -eva-gauges-domain (for some kinds of loops).

  5. The Messages panel in the GUI is chronologically sorted, so it can help the user follow what the analysis did, to try and identify sources of imprecision. However, even in this case, there is still an advantage to using -val-stop-at-nth-alarm: because the execution stops abruptly, there are possibly less callstacks displayed in the GUI, and therefore it may be easier to see at a glance which parts of the code were actually executed, and the dependencies between values that lead to the alarm.

Non-terminating function?

The "beginner" tutorial ends here, but one thing that you may have noticed after running EVA, is the dreaded "non terminating function" message at the end of the analysis:

[value:final-states] Values at end of function main:
  NON TERMINATING FUNCTION

This indicates that, somewhere during the analysis, a completely invalid state was found, and EVA could not proceed. This usually indicates one of the following:

  1. EVA's setup is incorrect: most likely, some function has missing or incorrect specifications, or some case that cannot be currently handled by EVA (e.g. recursive calls) was encountered.
  2. A definitively undefined behavior is present in the code, which may or may not lead to an actual bug during execution. In either case, it should be taken care of.

We will see how to handle such situations in the next post, using the GUI and the Nonterm plug-in (-nonterm), in a tutorial destined for beginners and experienced users alike.

Tuesday, March 7 2017

A simple EVA tutorial, part 1

(with the collaboration of T. Antignac, Q. Bouillaguet, F. Kirchner and B. Yakobowski)

This is the first of a series of posts on a new EVA tutorial primarily aimed at beginners (some of the later posts contain more advanced content).

Reminder: EVA is the new name of the Value analysis plug-in.

There is a Value tutorial on Skein-256 that is part of the Value Analysis user manual. The present tutorial is complementary and presents some new techniques available in Frama-C. If you intend to use EVA, we recommend you read the Skein-256 tutorial as well because it details several things that will not be repeated here. (However, it is not required to have read the Skein-256 before this one.)

The source code used in this tutorial is the version 0.3 of Monocypher, a C99-conformant crytographic library that also includes a nice test suite.

Note: newer versions of Monocypher are available! For this tutorial, please ensure you download version 0.3, otherwise you will not obtain the same behavior as described in this tutorial.

Future posts will include more advanced details, useful for non-beginners as well, so stay tuned!

Starting with Frama-C/EVA

This tutorial will use Monocypher's code, but it should help you to figure out how to analyze your code as well. First and foremost, it should help you answer these questions:

  1. Is my code suitable for a first analysis with Frama-C/EVA?
  2. How should I proceed?

(Un)Suitable code

There are lots of C code in the wild; for instance, searching Github for language:C results in more than 250k projects. However, many of them are not suitable candidates for a beginner, for reasons that will be detailed in the following.

Note that you can analyze several kinds of codes with Frama-C/EVA. However, without previous experience, some of them will raise many issues at the same time, which can be frustrating and inefficient.

Here is a list of kinds of code that a Frama-C/EVA beginner should avoid:

  1. Libraries without tests

    EVA is based on a whole-program analysis. It considers executions starting from a given entry point. Libraries without tests contain a multitude of entry points and no initial context for the analysis. Therefore, before analyzing them, you will have to write your own contexts1 or test cases.

  2. Command-line tools that rely heavily on the command-line (e.g. using getopt), without test cases.

    Similar to the previous situation: a program that receives all of its input from the command-line behaves like a library. Command-line parsers and tools with complex string manipulation are not the best use cases for the EVA's current implementation. A fuzzer might be a better tool in this case (though a fuzzer will only find bugs, not ensure their absence). Again, you will have to provide contexts1 or test cases.

  3. Code with lots of non-C99 code (e.g. assembly, compiler extensions)

    Frama-C is based on the C standard, and while it includes numerous extensions to handle GCC and MSVC-specific code, it is a primarily semantic-based tool. Inline assembly is supported syntactically, but its semantics needs to be given via ACSL annotations. Exotic compiler extensions are not always supported. For instance, trying to handle the Linux kernel without previous Frama-C experience is a daunting task.

  4. Code relying heavily on libraries (including the C standard library)

    Frama-C ships with an annotated standard library, which has ACSL specifications for many commonly-used functions (e.g. string.h and stdlib.h). This library is however incomplete and in some cases imprecise2. You will end up having to specify and refine several functions.

1 A context, here, is similar to a test case, but more general. It can contain, for instance, generalized variables (e.g. by using Frama_C_interval or ACSL specifications).

2 A balance is needed between conciseness (high-level view), expressivity, and precision (implementation-level details). The standard library shipped with Frama-C tries to be as generic as possible, but for specific case studies, specialized specifications can provide better results.

Each new version of Frama-C brings improvements concerning these aspects, but we currently recommend you try a more suitable code base at first. If your objective is to tackle such a challenging code base, contact us! Together we can handle such challenges more efficiently.

This explains why Monocypher is a good choice: it has test cases, it is self-contained (little usage of libc functions), and it is C99-conforming.

The 3-step method

In a nutshell, the tutorial consists in performing three steps:

  1. Parsing the code (adding stubs if necessary);
  2. Running EVA with mostly default parameters (for a first, approximated result);
  3. Tuning EVA and running it again.

The initial parsing is explained in this post, while the other steps will be detailed in future posts.

General recommendations

Before starting the use of Frama-C, we have some important general recommendations concerning the EVA plug-in:

  1. DO NOT start with the GUI. Use the command-line. You should consider Frama-C/EVA as a command-line tool with a viewer (the GUI). The Frama-C GUI is not an IDE (e.g. you cannot edit code with it), and EVA does not use the GUI for anything else other than rendering its results.

  2. Use scripts. Even a simple shell script, just to save the command-line options, is already enough for a start. For larger code bases, you will want Makefiles or other build tools to save time.

  3. Use frama-c -kernel-help (roughly equivalent to the Frama-C manpage) and frama-c -value-help to obtain information about the command-line options. Each option contains a brief description of what it does, so grepping the output for keywords (frama-c -kernel-help | grep debug for instance) is often useful. Otherwise, consider Stack Overflow - there is a growing base of questions and answers available there.

  4. Advance one step at a time. As you will see, the very first step is to parse the code, and nothing else. One does not simply run EVA, unless he or she is very lucky (or the program is very simple). Such precautions may seem excessive at first, but being methodical will save you time in the long run.

Parsing the code

Often overlooked, this step is erroneously considered as "too simple" ("just give all files to the command-line!"). In a few cases, it is indeed possible to run frama-c *.c -val and succeed in parsing everything and running EVA.

When this does not work, however, it is useful to isolate the steps to identify the error. Here are some general recommendations:

  1. Start with few files, and include more when needed

    Note that parsing may succeed even if some functions are only declared, but not defined. This will of course prevent EVA from analyzing them. If so, you may have to return to this step later, adding more files to be parsed.

  2. Ensure that preprocessor definitions and inclusions are correct

    Several code bases require the use of preprocessor definitions (-D in GCC) or directory inclusions (-I in GCC) in order for the code to be correctly preprocessed. Such information is often available in Makefiles, and can be given to Frama-C using e.g. -cpp-extra-args="-DFOO=bar -Iheaders".

    -cpp-extra-args is the most useful option concerning this step. It is used in almost every case study, and often the only required option for parsing. Note: old releases of Frama-C did not have this option, and -cpp-command was recommended instead. Nowadays, -cpp-command is rarely needed and should be avoided, because it is slightly more complex to use.

  3. Make stubs for missing standard library definitions

    Frama-C's standard library is incomplete, especially for system-dependent definitions that are not in C99 or in POSIX. Missing constants, for instance, may require the inclusion of stub files (e.g. stubs.h) with the definitions and/or the ACSL specifications. A common way to include such files is to use GCC's -include option, documented here.

  4. Save the result

    Use Frama-C's -save/-load options to avoid having to reparse the files each time. There is no default extension associated with Frama-C save files, although .sav is a common choice. For instance, running:

    frama-c <parse options> -save parsed.sav
    

    will try to parse the program and, if it succeeds, will save the Frama-C session to parsed.sav. You can then open it in the GUI (frama-c-gui -load parse.sav), to see what the normalized source code looks like, or use it as an input for the next step.

Reminder: for the EVA plug-in, the GUI is not recommended for parametrizing/tuning an analysis. It is best used as a viewer for the results.

The default output of EVA is rather verbose but very useful for studying small programs. For realistic case studies, however, you may want to consider the following options:

  • -no-val-show-progress: does not print when entering a new function. This will be the default in Frama-C 15 (Phosphorus);

  • -value-msg-key=-initial-state: does not print the initial state;

  • -value-msg-key=-final-states: does not print the final states of the analysis.

Note the minus symbols (-) before initial-state and final-states. They indicate we want to hide the messages conditioned by these categories.

Parsing monocypher

As indicated above, the naive approach (frama-c *.c) does not work with monocypher:

$ frama-c *.c

[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i (no preprocessing)
[kernel] Parsing monocypher.c (with preprocessing)
[kernel] Parsing more_speed.c (with preprocessing)
[kernel] syntax error at more_speed.c:15:
         13
         14    // Specialised squaring function, faster than general multiplication.
         15    sv fe_sq(fe h, const fe f)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
         16    {
         17        i32 f0 = f[0]; i32 f1 = f[1]; i32 f2 = f[2]; i32 f3 = f[3]; i32 f4 = f[4];

The first line is always printed when Frama-C parses a source file, and can be ignored.

The second line indicates that monocypher.c is being parsed.

The third line indicates that more_speed.c is now being parsed, implying that the parsing of monocypher.c ended without issues.

Finally, we have a parsing error in more_speed.c, line 15. That line, plus the lines above and below it, are printed in the console.

Indeed, the file more_speed.c is not a valid C source (sv is not a type defined in that file, and it does not include any other files). But this is not an actual issue, since more_speed.c is not part of the library itself, simply an extra file (this can be confirmed by looking into the makefile). Thus we are going to restrict the set of files Frama-C is asked to analyze.

Note: Frama-C requires the entire program to be parsed at once. It may be necessary to adapt compilation scripts to take that into account.

We also see that the rule for building monocypher.o includes a preprocessor definition, -DED25519_SHA512. We will add that to our parsing command, which will then become:

frama-c test.c sha512.c monocypher.c -cpp-extra-args="-DED25519_SHA512" -save parsed.sav

The lack of error messages is, in itself, a resounding success.

The first part of this tutorial ends here. See you next week!

For now, you can start reading the Skein-256 tutorial available at the beginning of the EVA manual. Otherwise, if you already know EVA (and still decided to read this), you may try to find some undefined behavior (UB) in Monocypher 0.3!

Hint: There is indeed some UB, although it does not impact the code in any meaningful way. At least not with today's compilers, maybe in the future... and anyway, it has been fixed in the newer releases of Monocypher.

Tuesday, December 13 2016

Frama-C Silicon has been released!

Frama-C 14 (Silicon) has just been released. In this post, we present a few additions that should help everyday usage of Value EVA.

Value is now EVA

The Value analysis plug-in has been renamed EVA, for Evolved Value Analysis. It has a new architecture that allows plugging abstract domains, among other features. It is a truly remarkable evolution which this post is too small to contain1, however, so it will presented in more detail later.

1 Facetious reference to Fermat's Last Theorem

Automatic built-in matching for libc functions

One of the new user-friendly features is the -val-builtins-auto option, which avoids having to memorize which built-in to use for each libc function that has one, namely malloc, free, and some floating-point and string-related functions (e.g. pow and strlen).

For instance, consider the following toy program, which simply allocates a buffer, copies a string into it, then allocates a buffer of the right size for the string, and stores it there.

// file.c
#include <stdio.h>
#include <stdlib.h>
#include "string.c" // include Frama-C's implementation of string.h

int main() {
  char *buf = malloc(256); // allocate a large buffer
  if (!buf) exit(1);
  char *msg = "silicon";
  strcpy(buf, msg);
  size_t n = strlen(buf);
  char *str = malloc(n + 1); // allocate a buffer with the exact size (plus '\0')
  if (!str) exit(1);
  strncpy(str, buf, n);
  str[n] = 0;
  size_t n2 = strlen(str);
  //@ assert n == n2;
  free(str);
  free(buf);
  return 0;
}

This program uses dynamic allocation and calls functions from string.h.

The following command-line is enough to obtain an interesting result:

frama-c file.c -val -val-builtins-auto -slevel 7

Without -val-builtins-auto one would need to use this overly long argument:

-val-builtin malloc:Frama_C_alloc_by_stack,free:Frama_C_free,strlen:Frama_C_strlen

For more details about Frama_C_alloc_by_stack, check the EVA manual, section 8.1.

The builtins for free and strlen were automatically chosen by EVA. Note however that strcpy and strncpy do not have builtins. In this case, we include "string.c" (which is actually in share/libc/string.c) to use the implementations available with Frama-C.

Analyzing a program using the implementations in share/libc/string.c is less efficient than using a built-in, but more precise than using only a specification. These implementations are designed to minimize the number of alarms when their inputs are imprecise. Also, because they are not optimized for execution, they are conceptually simpler than the actual libc implementations.

Using these functions. -slevel 7 ensures that their loops are fully unrolled in the example. Can you guess why 7 is the right value here?

Inspecting values in the GUI

Another improvement to usability comes in the form of a popup menu in the GUI. To see it, run the following command using the same file as previously:

frama-c-gui file.c -val -val-builtins-auto -slevel 7

On the Frama-C GUI, click on the str expression in the statement str = (char *)malloc(n + (size_t)1); (corresponding to line 11 in the original source code). Then open the Values tab, and you will see something similar to this:

Show pointed values in the GUI

In the Values tab on the bottom, right-clicking on the cell containing the NULL; &__malloc_main_l11[0] value will show a popup menu "Display values for ...". Clicking on it will add a new column displaying its contents.

Before Silicon, this information was already available, but as the result of a long and painful process. The new popup menu shows one entry per pointed variable in the chosen cell, so if there are several different values, there will be several popup menu entries.

malloc may fail

In the previous example, the values of str are those returned by the malloc builtin: NULL and a newly allocated base (__malloc_main_l11). This models the fact that there may not be enough memory, and malloc may fail. The code should definitely handle this case! But for the hurried evaluator, the use of option -no-val-malloc-returns-null can help focus on the other potential run-time errors (before coming back to the malloc-related ones).

Still ways to go

In this example, there are still some warnings, due to the specification of functions strcpy and strncpy, which use logical constructs not yet evaluated by EVA (but very useful for WP). They are not an issue in this example, since we used the actual implementation of these functions, and therefore do not need their specifications, but future work on EVA will help deal with these details and provide a more streamlined experience.