Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - oxygen

Entries feed - Comments feed

Monday, November 19 2012

Funny floating-point bugs in Frama-C Oxygen's front-end

In a previous post, almost exactly one year ago, before Frama-C Oxygen was released, I mentioned that the then future release would incorporate a custom decimal-to-binary floating-point conversion function. The reason was that the system's strtof() and strtod() functions could not be trusted.

This custom conversion function is written in OCaml. It can be found in src/kernel/ in the now available Oxygen source tree. This post is about a couple of funny bugs the function has.


There had been arguments about the inclusion of the “correct parsing of decimal constants” feature in Frama-C's front-end, and about the best way to implement it. My colleague Claude Marché was in favor of using the reputable MPFR library. I thought that such an additional dependency was unacceptable and I was against the feature's inclusion as long as I could not see a way to implement it that did not involve such a dependency.

When I presented my dependency-avoiding solution to Claude, he said: “But how do you know your function works? Did you prove it?”. To which I replied that no, I had not proved it, but I had thoroughly tested it. I had written a quick generator of difficult-to-convert decimal numbers, and I was rather happy with the confidence it gave me.

My generator allowed me to find, and fix, a double rounding issue when the number to convert was a denormal: in this case the number would first be rounded at 52 significant digits and then at the lower number of significant digits implied by its denormal status.

I owe Claude a beer, though, because there were two bugs in my function. The bugs were not found by my random testing but would indeed have been found by formal verification. If you want to identify them yourself, stop reading now and start hunting, because the bugs are explained below.

Stop reading here for a difficult debugging exercise

Say that the number being parsed is of the form numopt1.numopt2Enum3 where numopt1 and numopt2 expand to optional strings of digits, and num3 expands to a mandatory string of digits.

The sequences of digits numopt1 and numopt2 can be long. The string numopt2 in particular should not be parsed as an integer, because the leading zeroes it may have are significant.

At this point of the parsing of the input program, we have already ensured that num3 was a string of digits with an optional sign character at the beginning. In these conditions, it is tempting to simply call the OCaml function int_of_string. The function int_of_string may still fail and raise an exception if the string represents a number too large to be represented as a 31- or 63-bit OCaml int.

This is easy to fix: if the program contains a literal like 1.234E9999999999999999999, causing int_of_string to fail when parsing the exponent, return infinity. A vicious programmer might have written 0.000…01E9999999999999999999, but this programmer's hard-drive is not large enough to contain all the digits that would prevent infinity to be the correct answer.

Similarly, if int_of_string chokes because the programmer wrote 1.234E-9999999999999999999, the function can safely return 0.0 which, for the same reason, is always the correctly rounded floating-point representation.

Or so it would seem. The above logic is implemented in function parse_float inside Frama-C Oxygen, and this is where the bugs are.

Stop reading here for an easy debugging exercise

During a code review, my colleague Boris Yakobowski and I found that Oxygen had the following unwanted behaviors. Read on for the solution.

Exhibit one: spurious warning

double d = 0.0E-9999999999999999999;

For the program above, a warning is emitted whereas the constant is, in fact, exactly represented. The only issue here is the spurious warning:

$ frama-c e1.c
[kernel] preprocessing with "gcc -C -E -I.  e1.c"
e1.c:1:[kernel] warning: Floating-point constant 0.0E-9999999999999999999
is not represented exactly.
Will use 0x0.0000000000000p-1022.
See documentation for option -warn-decimal-float

Exhibit two: confusion between zero and infinity

double d = 0.0E9999999999999999999;

This bug is more serious:

$ frama-c e2.c
[kernel] preprocessing with "gcc -C -E -I.  e2.c"
e2.c:1:[kernel] warning: Floating-point constant 0.0E9999999999999999999
is not represented exactly.
Will use inf.
See documentation for option -warn-decimal-float

These two related bugs are fixed in the development version of Frama-C.

Friday, July 27 2012

Oxygen is stricter about types and why you should get used to it

I have just sent a list of changewishes (1, 2) to a static analysis competition mailing-list, and that reminded me of a blog post I had to write on the strictness of the type-checker in upcoming Frama-C release Oxygen. This is the blog post.

This post is not about uninitialized variables

The static analysis competition in question evaluates each participating tool on many testcases. A good number of these testcases are rejected by the current Frama-C. This is why I was writing to the mailing-list: Frama-C cannot participate in the competition if the testcases are not changed (Frama-C does not have to participate, of course. It's just that it looks fun and we would probably like to be in it).

In fact, many of the testcases were already problematic for Frama-C version 20111001 (Nitrogen), the version in the current Debian, Ubuntu, FreeBSD, NetBSD and other Unix distributions. Indeed, a lot of the testcases rely on uninitialized variables for entropy, which this post by Xi Wang and this post by Mark Shroyer show is wrong. Instead of the problem that is supposed to be detected (or not), Nitrogen detects the uninitialized use. I covered this already; this blog post is not about uninitialized variables (keep reading!).

This post is about C type-checking

While trying to get a list of uninitialized-variable-using testcases, I realized that something had changed since my last evaluation of the competition's benchmarks. Many of them were now rejected at type-checking!

The new problem is, many testcases in the benchmarks call functions without having provided a prototype, and some ultimately define the called function with a type incompatible with the type inferred at the call site. Frama-C used to be lenient about typing issues, but after fixing one soundness bug too many that was caused by the leniency, we have decided to throw in the towel. Virgile described one of the typing issues for which Frama-C used to be too lenient. It was not the only one.

This is an unusual position to take. In the article A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World, the makers of a commercially successful bug-finding tool state:

Our product's requirements roughly form a "least common denominator" set needed by any tool that uses non-trivial analysis to check large amounts of code across many organizations; the tool must find and parse the code, [...]

The above is wise: to be maximally useful, the static analyzer should choke on as few idiosyncrasies as possible. The analyzed code can be assumed to compile with some compiler, and to go through some tests (there must still be a few of these, right)? Why warn when a function is called without a prototype, if the compiler accepted it? Why reject when the function's implementation has a type that is incompatible with the type that was inferred at the call site? This is probably not the issue the user is looking for (if it was, the user would have fixed it when eir compiler warned for it).

Oxygen is strict and that's it

Well, our answer is that we tried and we found that it was too much work to try to be sound and precise with these constraints, as exemplified by Virgile's blog post. Show us a tool that accepts your weakly typed program, and we will show you a tool that probably isn't sound and precise at the same time (we have kept the examples that led to the decision. These examples demonstrate real issues masked by lenient typing. If you think you have found a sound analyzer that is at the same time conveniently permissive on typing, it will be our pleasure to provide the examples for you to try).

We hope that Frama-C will still be useful with these new restrictions on typing. Fortunately for us, there are more real worlds than the expression “the Real World” in the cited article's title might lead you to think (and this quip should not be taken as a reproach towards the authors of “A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World”, a very useful document with an appropriately chosen title considering its intended public).