By André Maroneze on Thursday, November 15 2018, 12:00
Frama-C chose to participate in the
SATE VI Ockham Sound Analysis Criteria.
With the upcoming Frama-C 18 (Argon) release, results can be reproduced
using the open source version of the Eva plug-in.
This post briefly presents SATE, the Juliet test suite,
and the results obtained with Frama-C 18.
We then describe the
Frama-C SATE VI repository on Github,
created to allow anyone to reproduce the results obtained with Frama-C/Eva.
We also explain some patches that were applied to the test suites, and
our choice of parametrization options for Eva.
SATE VI and Juliet 1.3
SATE (Static Analysis Tool Exposition),
organized by NIST, is a comparative study of static analysis tools:
toolmakers volunteer to participate by running their tools on the sets provided
by NIST, and then by submitting their analysis scripts and results.
The NIST SAMATE team then compiles a dense report,
comparing and evaluating all participating tools.
For the analysis of C programs, there are two tracks: one in which all kinds of
tools participate, called the
and another dedicated to sound analyzers, in the sense of the
Ockham Sound Analysis Criteria.
For the Ockham track in SATE VI, we chose the Juliet 1.3 test suite for C/C++,
which contains over 64,000 test cases. Some are written in C++, but most are
written in C. Test suites are organized in directories related to CWEs
(Common Weakness Enumerations): each directory has code samples containing
violations of the CWE. For each test case, there are two variants: one with
good code (which does not trigger the CWE), and another with bad code
(which triggers the CWE in question). In our tests using Juliet, Frama-C is
supposed to find all occurrences of bad code; it may, however, report some
good code as bad.
CWEs handled by Frama-C/Eva
For the analysis with Frama-C/Eva, we chose the subset of CWEs in Juliet 1.3
for C/C++ which are handled by the analyzer.
Click the arrow below to unfold the list.
CWEs in Juliet 1.3 handled by Frama-C/Eva (18 - Argon)
- CWE121 - Stack Based Buffer Overflow
- CWE122 - Heap Based Buffer Overflow
- CWE123 - Write What Where Condition
- CWE124 - Buffer Underwrite
- CWE126 - Buffer Overread
- CWE127 - Buffer Underread
- CWE190 - Integer Overflow
- CWE191 - Integer Underflow
- CWE194 - Unexpected Sign Extension
- CWE195 - Signed to Unsigned Conversion Error
- CWE196 - Unsigned to Signed Conversion Error
- CWE197 - Numeric Truncation Error
- CWE369 - Divide by Zero
- CWE415 - Double Free
- CWE416 - Use After Free
- CWE457 - Use of Uninitialized Variable
- CWE464 - Addition of Data Structure Sentinel
- CWE469 - Use of Pointer Subtraction to Determine Size
- CWE475 - Undefined Behavior for Input to API
- CWE562 - Return of Stack Variable Address
- CWE587 - Assignment of Fixed Address to Pointer
- CWE588 - Attempt to Access Child of Non Structure Pointer
- CWE590 - Free Memory Not on Heap
- CWE617 - Reachable Assertion
- CWE665 - Improper Initialization
- CWE690 - NULL Deref From Return
- CWE758 - Undefined Behavior
- CWE835 - Infinite Loop
- CWE843 - Type Confusion
Note that some CWEs are very broad, such as CWE758 - Undefined Behavior.
We do not claim Frama-C/Eva handles all cases which might be categorized
under such CWEs; however, those considered in the Juliet 1.3 test suite have
all their bad code samples correctly labeled.
Some examples of CWEs which are not considered by Frama-C/Eva include
CWE426 - Untrusted Search Path, CWE90 - LDAP Injection and
CWE780 - Use of RSA Algorithm Without OAEP. All three of them
constitute functional-level weaknesses which do not cause runtime errors.
Some of the CWEs not currently considered by Frama-C/Eva could be handled via
specific function annotations, but this would constitute a very ad hoc solution.
Other plugins in the Frama-C platform might be better suited for these CWEs;
e.g., using the WP plugin with function annotations,
or the Secure Flow plugin with trusted and untrusted sinks and sources,
might allow identifying the cases with API violations leading to weaknesses.
The following table summarizes the results obtained with Frama-C 18 (Argon).
The script that produces this table is presented
in a later section.
total imprecise : 751
total non-terminating: 3
total ok : 44668
total timeout : 0
total unsound : 0
precision (excluding non-terminating): 0.983465
Each test case is labeled as one of the following cases:
ok: correctly labeled: either a good test without alarms, or a bad
test with alarms;
imprecise: a good test with a false alarm;
unsound: a bad test without an alarm;
non-terminating: the program does not terminate; this should never happen,
except in the case of CWE835 - Infinite Loop;
timeout: a test which did not finish within the allotted time
(by default, 30 seconds).
This never happens in our benchmarking rig, but slower or overloaded machines
may see this happen. If so, then a larger timeout should be given and the
test should be re-run.
The precision (98.3%) displayed in the last line is the number of
divided by the total number of tests (
The improvements between Frama-C 17 (Chlorine) and 18 (Argon) are noteworthy,
especially the fact that the
memset builtins are now released in
the open source version. Other improvements in precision also contributed to
remove some spurious warnings and to simplify the analysis script.
The remaining imprecisions are due to different situations, but most of them
occur several times, due to reuse of the same patterns among different test
cases. One source of imprecision is the
snprintf function, whose
specification is not precise enough to ensure that the output string is
null-terminated. This leads to false alarms when that string is used later in
the code. Another case is related to CWE369 - Divide by Zero, in which some
tests use socket-related functions whose specifications do not allow for
precise results in every case.
The Frama-C SATE VI Repository
The Frama-C SATE VI repository on Github
allows anyone to run Frama-C on the Juliet 1.3 test suite for C/C++,
to reproduce the results or to try different approaches, e.g. parametrizing
the analysis differently, or trying a new abstract domain.
Software requirements are: Frama-C 18.0 (Argon), GNU Make (>= 4.0), the GNU
parallel tool, Bash (>= 4.0) and
wget (to download the Juliet archive from
The repository is structured in the following way:
GNUmakefile allows bootstrapping the repository, by downloading
the Juliet 1.3 for C/C++ archive from NIST's website and then decompressing it
(Juliet 1.3 is not distributed in our repository). It also applies
some patches to Juliet's sources, described
in a later section;
fc directory contains Frama-C scripts used to run the analysis and
automatically evaluate its results. The main scripts are:
analyze.sh: sets the default analysis parameters; these differ from
Frama-C's defaults in several ways, being stricter in some cases but less
strict in others;
evaluate_case.sh: performs automatic evaluation of good and bad cases
into the categories presented before;
fc/patches directory contains some patches applied to Juliet 1.3
to fix known issues in a few tests. It also changes the way non-deterministic
random numbers are used in some tests
(details are presented later);
C directory contains the extracted Juliet 1.3 archive, with its scripts
and test cases;
- finally, the
oracles directory contains a versioned copy of the oracles
produced by the analysis with Frama-C/Eva; this allows comparing different
parametrizations, and following the evolution of the platform.
Reproducing the results
The overall process to use the repository is the following:
git clone https://github.com/Frama-C/SATE-VI.git
make step takes a very long time
(2+ hours, depending on your CPU and number of cores).
Note that, if Frama-C is not installed in the PATH, you can use the
fc/frama-c-path.sh script to specify the directory where Frama-C is
installed. This is especially useful to compare different versions of Frama-C.
make inside the
C/testcases directory, a small summary with
the number of tests performed in each subdirectory will be presented in
CWE121_Stack_Based_Buffer_Overflow/s01: 876 tests
CWE121_Stack_Based_Buffer_Overflow/s02: 1020 tests
CWE121_Stack_Based_Buffer_Overflow/s03: 1020 tests
./summarize_results.sh will also output this result.
stat_results.py script will produce an overall summary across
all subdirectories, as presented in the Results section.
Re-running individual tests
Inside each directory in
C/testcases/, it is possible to run a single test,
by removing its extension and adding
_bad.res, e.g. if there
is a file
file.c inside directory
CWE000/s01, then doing
make file_good.res will run the test and produce three files:
file_good: output of the analysis;
file_good.csv: CSV report of warnings produced during the analysis;
file_good.res: result of
evaluate_case.sh applied to the test case report;
Note that for tests consisting of multiple files (each ending with
c, etc.), the test case name is their prefix without the letter, e.g.
for the test comprised of files
file_42b.c, the command is
make file_42_good.res (or
Finally, by replacing
.gui in the make command, it is possible to
open the Frama-C GUI on the test case, to inspect it.
Patches applied to Juliet 1.3
In the Frama-C SATE VI repository, there are two sets of patches applied to
Juliet 1.3 for C/C++:
These are fixes in a small number of tests, due to 2
patterns which were found out to lead to undefined behavior.
The first fix consists in replacing an overflow check which accidentally
introduces a signed overflow:
/* FIX: Add a check to prevent an overflow from occurring */
-if (abs((long)data) <= (long)sqrt((double)CHAR_MAX))
+if (data <= LONG_MAX && labs(data) <= (long)sqrt((double)CHAR_MAX))
data has type
unsigned int. In the original code, the call to
abs((long)data) is equivalent to
int as argument. If
long have the same length
(e.g. as in x86 32-bit), then
data can be larger than
INT_MAX, since they have the same length), leading to a signed
overflow in the call, and thus an undefined behavior.
The patched version first checks that
data is small enough, and then
labs to avoid downcasting
The second fix concerns a set of files which had a local variable declared
in a scope smaller than intended:
static void goodG2B()
void * data;
+ int intBuffer = 8;
/* Initialize data */
data = NULL;
/* FIX: Point data to an int */
- int intBuffer = 8;
data = &intBuffer;
CWE843_Type_Confusion__short_68_goodG2BData = data;
In this set of test cases,
intBuffer (and similar variables) was declared
inside a block, and the value was later used outside of the scope of the
variable, leading to undefined behavior. The fix consists in declaring the
variable in the main scope of the function.
This concerns mainly the
used in Juliet to produce non-deterministic numbers. This macro is used in
some test cases to assign values to variables of different types. The result
is downcast to
short whenever necessary. However, this downcast
leads to warnings due to usage of option
This option is enabled in the
analyze.sh script since some CWEs are
related to it.
The patches applied to the test cases consist in replacing calls to
new macros defined in a way that prevents overflows from happening.
These macros are defined in a way very similar to the existing
RAND64() macros, but avoiding overflows triggering downcast warnings.
To preserve existing behavior, these macros are defined to
not running Frama-C.
Finally, minor patches were applied to pretty-printing functions in
void printHexCharLine (char charHex)
- printf("%02x\n", charHex);
+ printf("%02hhx\n", charHex);
printf modifier for a variable of type
this is very rarely used in practice. Still, the Variadic plug-in warns
about it, so fixing the test case support file prevents spurious warnings.
Note that all of these patches are intended to improve the precision of the
analysis, but do not affect its correction: not applying these patches will
lead Frama-C/Eva to produce several spurious warnings (and, in a few cases,
such warnings will lead to non-termination, when the only possible behavior is
undefined), but their absence will not make Frama-C's results unsound; bad
cases will still be reported as such.
Rationale for analysis options
analyze.sh script contains a series of non-default options used in the
analysis. We present here the rationale for choosing these options during
the analysis of Juliet 1.3. Similar options are grouped together for
-no-autoload-plugins -load-module from,inout,report,scope,eva,variadic
These options are present in most of Frama-C's test suites.
-no-autoload-plugins option simply disables by default all plugins,
which speeds up initialization of Frama-C. We then only load the plug-ins which
are needed for Juliet:
eva and its derived plugins (
report for the CSV report, and
variadic to handle variadic functions
present in almost every test (e.g.
Overall, this is an optimization for running thousands of tests; when running
only a few dozens of tests, the difference is negligible.
-kernel-warn-key parser:decimal-float=inactive \
-kernel-warn-key typing:no-proto=inactive \
-kernel-warn-key typing:implicit-conv-void-ptr=inactive \
These options disable warnings which are not directly related to runtime
errors, but are enabled by default since they often indicate code quality
issues in real code bases.
The first three warnings could be avoided by patching the code
(e.g. adding prototypes, explicit casts, and avoiding inexact floating-point
constants); the latter is related to addresses of variables remaining
accessible out of their scopes (in Eva, their value becomes
and it is emitted since it is often easier to identify the moment where the
variable becomes out of scope, rather than the later time at which it may
be used. This is not an undefined behavior, unless the variables are actually
used (which does not happen in the good test cases). In any case, when the
values are accessed (as in the bad test cases), a different warning is
These are all options related to minimizing the default verbosity of the
analysis, except for the last, which adds callstacks to the output: when
there are warnings, it is better to know where they come from.
-slevel 300 \
Some amount of slevel is useful to improve precision of the analysis.
The chosen value is arbitrary, and variable according to the characteristics
of the code under analysis. In the case of Juliet 1.3, this value is a good
compromise between precision and speed of the analysis.
-eva-builtin alloca:Frama_C_vla_alloc_by_stack \
alloca function (which is not in the C standard) is used in several
test cases. In all of them, it behaves just like the allocation of a VLA
(variable length array), that is, the variables are not used outside of the
scope in which
alloca was called1 . It is therefore safe
to associate the
vla_alloc builtin to calls to the
alloca function in
The main difference between
alloca and the allocation of a VLA
is that the latter is deallocated at the end of the scope in which it was
alloca memory blocks are deallocated at the end of the
function in which they are called.
-warn-special-float none \
By default, Frama-C considers non-finite floating-point values as forbidden,
emitting warnings when they may arise, even though this does not constitute an
undefined behavior. This option allows non-finite floating-point values, thus
avoiding spurious warnings.
Signed downcasts (possible loss of information when converting a larger type
to a smaller one) and unsigned overflows, by default, do not constitute
undefined behavior; however, it is a good practice to avoid them, and some
specific CWEs do require treating them as undesired.
Another option that could have been activated is
-eva-symbolic-locations-domain, but in Juliet 1.3 it does not make any
difference, neither in the oracles nor in the overall analysis time.
By default, copying a possibly indeterminate value (e.g. uninitialized)
in Eva triggers a warning, even if the value is copied in a standards-compliant
way (e.g. via
unsigned char pointers). This helps detect problems earlier in
the code, rather than when used. However, the usage of such values already
triggers warnings anyway, so we can afford to be less strict, avoiding
spurious warnings in good test cases where the copied values are not used
These two abstract domains are efficient enough to be used in most analyses,
and improve the precision of the results at a minimal cost. The equality domain
in particular is useful in some tests which require relational information,
while the sign domain is well suited for testing divide-by-zero errors.
The equality domain eliminates false alarms in 184 tests, and the sign domain
removes 308 false alarms.
This is a technicality, but useful for scripts and versioning: by default,
Frama-C uses relative paths when they are "sufficiently close" to the current
working directory, otherwise absolute paths are used
(e.g. when there is a
.. in the path). To avoid versioning such paths
(and also for creating test oracles), Frama-C has a mechanism to
replace specific paths with user-defined strings. The most common use case is
Frama-C's share directory, written as
FRAMAC_SHARE in reports and in the
GUI. In this case, the actual path to the
C/testcasesupport directory is
replaced with string
TESTCASESUPPORT_DIR. This ensures every user should
have the same oracles for the CSV files mentioned below.
-report-no-proven -report-csv <file>.csv
These options are not present in
analyze.sh, but in
due to availability of the output filename. These options invoke the Report
plug-in to generate a CSV file containing the list of properties in the
program, along with their statuses. Option
-report-no-proven filters only
the properties which are left Unknown or Invalid, which are the ones we
are interested in when looking for undefined behaviors.
The NIST SAMATE Juliet 1.3 test suite for C/C++ provides an excellent set of
well-defined programs, as well as programs with isolated bugs, which are very
useful for automated testing and comparative evaluation.
Frama-C/Eva can handle several of the CWEs present in the test suite, and
recent improvements in the analyzer allow it to obtain a very high precision.
The tests cases are small, which does not make them representative of large
case studies, but they allowed the identification of limitations in the
analyzer and offered opportunities for improvement.
The analysis with Frama-C/Eva allowed the identification of very few issues
with the code, which given the size of the test suites is an indicator of its
Overall, participation in the Ockham track enabled Frama-C/Eva to improve its
analysis, and offered a very extensive set of test cases for regression testing
and quantitative experimentation with different parametrizations. For instance,
the sign domain, developed originally as part of a tutorial for Eva, proved
itself very useful to improve the precision of several tests related to
CWE369 - Divide by Zero, with a minimal runtime cost. The large-scale testing
afforded by Juliet enables such kinds of evaluation.
We would like to thank NIST for preparing Juliet and organizing SATE,
and for their commitment to the exposition of innovative tools for
program analysis and verification.