A simple EVA tutorial, part 2
By André Maroneze on Friday, March 17 2017, 12:00 - Permalink
On the previous post we've seen some recommendations about using Frama-C/EVA, and some tips about parsing. In this post, we will see how to run EVA, and how to quickly setup it for a more precise result.
We will reuse the save file produced at the end of the parsing,
The default parameters of EVA are intended for a fast analysis.
In Frama-C 14 (Silicon), option
-val-builtins-auto is recommended to enable
the usage of built-in functions that improve the precision and sometimes
the speed of the analysis1.
1 This option will be enabled by default in Frama-C 15 (Phosphorus).
The following command-line should result in a relatively quick first analysis:
frama-c -load parsed.sav -val-builtins-auto -val -save value.sav
Note that we save the result in another file. It can be reused as input for another analysis, or visualization in the GUI.
The analysis will likely output many alarms, some due to loss of precision, others due to an incorrect setup. Here are a few important alarms concerning an incorrect setup:
Missing code or specification
file.c:42:[kernel] warning: Neither code nor specification for function foo, generating default assigns from the prototype
There are two major causes for this warning: (1) the file containing the function definition was not given to Frama-C during parsing; or (2) the function has no source code and no ACSL specification was given.
In the first case, the solution is to include the missing source file. Parsing will succeed even if only a declaration (function prototype) is present, but EVA requires more than that. It may be necessary to return to the parsing stage when this arrives.
In the second case, you must supply an ACSL specification for the function, otherwise EVA will assume it has no effect, which may be unsound. To do it with minimal modifications to the original code, you can do the following:
- create a file, say
- copy the function prototype to be stubbed in
stubs.h(adding the necessary
#includes for the types used in the prototype);
- add an ACSL specification to this prototype;
stubs.hin the original source, either by adding
#include "stubs.h", or using GCC's
-cpp-extra-args="-includestubs.h", without spaces between
-includeand the file name).
- create a file, say
Missing assigns clause, or missing
When analyzing functions without source code, EVA imposes some constraints on the ACSL specification: they must contain
assignsclauses, and these clauses must have
\fromdependencies. Otherwise, warnings such as the following may be generated:
foo.c:1:[value] warning: no 'assigns \result \from ...' clause specified for function foo foo.c:3:[value] warning: no \from part for clause 'assigns *out;' of function foo foo.c:6:[kernel] warning: No code nor implicit assigns clause for function foo, generating default assigns from the prototype
The following is an example of an incomplete specification:
/*@ assigns *out; */ void foo(int in, int *out);
Even if it contains an
\assignsclause for pointer
out, it does not say where the result comes from. Adding
\from in, for instance, makes the specification complete from the point of view of EVA.
Note: EVA cannot verify the correctness of the specification in the absence of code, especially that of
ensuresclauses. If you provide an incorrect specification, the result may be unsound. For that reason, it is often useful to write a simplified (or abstract) implementation of the function and then run the analysis. If EVA has both the code and the specification, it is able to check
ensuresclauses and detect some kinds of errors.
Running EVA on monocypher
Running EVA on
parsed.sav will start the value analysis on the
test.c. Due to the large number of small functions in Monocypher,
EVA will output a huge amount of lines, whenever a new function is entered.
-no-val-show-progress will omit messages
emitted whenever entering a new function.
Also, the fact that this code contains lots of small functions with few
or no side-effects is a very strong indicator that
-memexec-all will be very
helpful in the analysis.
Memexec, which is part of EVA, acts as a cache that allows reusing the result of function calls when their memory footprint is unchanged. It dramatically improves performance.
Combining both options, we obtain the following command-line:
frama-c -load parsed.sav -val -val-builtins-auto \ -no-val-show-progress -memexec-all -save value.sav
The analysis will then start and emit several warnings (mainly due to imprecisions). It should finish in a few seconds, depending on your processor speed.
After running the value analysis, it is a good time to check what the result looks like, using the GUI:
frama-c-gui -load value.sav
In the screenshot below, we indicate some parts of the GUI that are useful when inspecting EVA results (besides the source view). We also indicate some parts that are never (or rarely) used with EVA.
Note that the Properties tab (between Console and Values) is not updated automatically: you need to click on the Refresh button before it outputs anything, and after changing filters.
Several tips concerning this panel were presented in a previous post about the GUI. If you follow them, you will be able to make the Properties panel show the total of Unknown (unproven) properties for the entire program, and only those. This number is often similar to the number of messages in the Messages panel.
In Monocypher, after setting the filters to show every Unknown property in the
entire program, and clicking Refresh, we obtain over 900 unproven properties.
Since the analysis was not tuned at all for precision (other than with
-val-builtins-auto), this number is not particularly surprising.
A quick way to improve on results is to use the Loop analysis plug-in.
The Loop analysis plug-in performs a mostly syntactic analysis to estimate loop bounds in the program (using heuristics, without any soundness guarantees) and outputs a list of options to be added to the value analysis. Running EVA again with these options should improve the precision, although it may increase analysis time. Loop analysis' main objective is to speed up the repetitive task of finding loop bounds and providing them as semantic unrolling (
-slevel) counters. The analysis may miss some loops, and the estimated bounds may be larger or smaller, but overall it minimizes the amount of manual work required.
Loop analysis does not depend on EVA, but if it has been run, the results may be more precise. In Monocypher, both commands below give an equivalent result (the difference is not significative in this context):
frama-c -load parsed.sav -loop frama-c -load value.sav -loop
In both cases, Loop analysis' effect is simply to produce a text output that should be fed into EVA for a new analysis:
[loop] Add this to your command line: -val-slevel-merge-after-loop crypto_argon2i \ -val-slevel-merge-after-loop crypto_blake2b_final \ ...
You should, by now, use a shell script or a Makefile to run the Frama-C command
line, adding all the
lines to your command.
Let us consider that the environment variable
LOOPFLAGS contains the result
of Loop analysis, and
EVAFLAGS contains the flags mentioned previously
Then the following command will re-run EVA with a more detailed
(and, hopefully, precise) set of parameters:
frama-c -load parsed.sav $LOOPFLAGS -val $EVAFLAGS -save value2.sav
Opening this file on the GUI will indicate approximately 500 warnings, which is still substantial, but much better than before. Improvements to Loop analysis in the next release of Frama-C will allow this number to be reduced slightly.
The basic tutorial is over, but there are several paths to choose
From here on, there are several possibilities to reduce the imprecisions in the analysis:
Inspect alarms and see if their functions contain loops that were not inferred by Loop analysis; if so, adding their bounds to
-slevel-functioncan improve the precision of the analysis;
Increase the precision using other parameters, such as
Stub libc functions to emulate/constrain inputs when relevant;
Use EVA's abstract domains (e.g.
-eva-equality-domain) to improve precision;
Stop at the first few alarms (
-val-stop-at-nth-alarm), to track more closely the sources of imprecision. However, when there are hundreds of alarms, this is more useful as a learning experience than as a practical solution.
Each solution is more appropriate in a specific situation. Here are a few tips for an intermediate-level user of EVA:
Functions that perform array initialization are often simple (a loop with a few assignments per iteration), so unrolling them completely should not slow down the analysis excessively. The Loop analysis plug-in usually works with them, but some pattern variations may throw it off. You may want to check the proposed values in such loops. Because initialization happens early in program execution, checking such loops may yield good results.
plevelparameter is often used in response to messages such as:
monocypher.c:491:[kernel] more than 200(255) elements to enumerate. Approximating.
where the first number is the current
plevel(by default, 200), and the second number is the amount that would be required to avoid the approximation. In this case,
-plevel 255would be reasonable, but if you had
more than 200(67108864) elements, for instance, it would not be helpful to set the plevel to such a high value.
Stubbing is a good approach when dealing with functions that are closely system-dependent, specifically input functions that read from files, sockets, or from the command-line. Check the Frama-C builtins in
__fc_builtin.h, they provide some useful primitives for abstracting away code with non-deterministic functions.
EVA's domains have specific trade-offs between precision and efficiency, and some have broader applicability than others. Future posts in this blog will describe some of these domains, but as a rule of thumb, two domains that are fairly low-cost and generally useful are
-eva-equality-domain(for syntactic equalities) and
-eva-gauges-domain(for some kinds of loops).
The Messages panel in the GUI is chronologically sorted, so it can help the user follow what the analysis did, to try and identify sources of imprecision. However, even in this case, there is still an advantage to using
-val-stop-at-nth-alarm: because the execution stops abruptly, there are possibly less callstacks displayed in the GUI, and therefore it may be easier to see at a glance which parts of the code were actually executed, and the dependencies between values that lead to the alarm.
The "beginner" tutorial ends here, but one thing that you may have noticed after running EVA, is the dreaded "non terminating function" message at the end of the analysis:
[value:final-states] Values at end of function main: NON TERMINATING FUNCTION
This indicates that, somewhere during the analysis, a completely invalid state was found, and EVA could not proceed. This usually indicates one of the following:
- EVA's setup is incorrect: most likely, some function has missing or incorrect specifications, or some case that cannot be currently handled by EVA (e.g. recursive calls) was encountered.
- A definitively undefined behavior is present in the code, which may or may not lead to an actual bug during execution. In either case, it should be taken care of.
We will see how to handle such situations in the next post, using the GUI and
the Nonterm plug-in (
-nonterm), in a tutorial destined for beginners and
experienced users alike.