Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Saturday, December 21 2013

Improving the world for 2014, one programmer at a time

I move that all computing devices be equipped with a small explosive charge, set to explode when someone refers someone else to Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic without the referrer emself having read it.

It would not have to be lethal. I could be content if the apparatus maimed the user into harmlessness.


For what it is worth, the advice I wish I had received instead of being suggested Goldberg's essay the first time is:

  1. Print numbers in hexadecimal format as often as you can (%a in C99). Input them that way when it makes sense.
  2. Don't worry, floating-point numbers are just bits. Stop listening to anyone who tries to confuse you with generic reasoning about an arbitrary base β.


YMMV

Sunday, November 24 2013

C-Reduce and Frama-C

Automatic testcase reduction for compilers: the story so far

A previous post linked to a discussion of manual testcase reduction when the program being debugged is a compiler (and the input demonstrating the faulty behavior is thus a C program). Since then, quite a few related events took place:

In a comment to the linked post, someone mentioned automatic testcase reduction with “delta debugging”, and Bruce Dawson, the author of the original post, replied:

Given that the compiler sometimes changed what register it was using I’m not sure that the effort of writing an accurate test for this bug would be worthwhile. It really didn’t take me much time to reduce this bug — half an hour perhaps? Once you don’t need to link the code you can slash-and-burn at a fearsome rate, and my brain is more flexible at recognizing legitimate variations of the bug than any script would be.

And I further commented that on the other hand, a generic “accurate [automatic] test” only needs to be written once, and can serve multiple times. This post intends to show how right Bruce is: subtle pitfalls lurk when writing the acceptance test for automatically reducing a C program. Some of the dangers are theoretical enough when the program being debugged is a compiler, but become more annoying in practice when it is, say, a static analysis framework.

Meanwhile, John Regehr wrote a blog post of his own about the automatic reduction of testcases for compilers. The remarks in his post are quite orthogonal to the remarks in this one. My remarks are in the context of reducing testcases for Frama-C, which is a bit of a misappropriation, and reveals specific issues in the same way that using Csmith to find bugs in Frama-C provided some new insights.

The general idea explained on an example

Consider the example of a file from the Linux kernel that Frama-C's front-end chokes on. The original file causing the problem is 1.5MiB large after pre-processing and we would like automatic reduction to produce a smaller file that exhibits the same problem, so that we can show it to our Frama-C front-end expert.

The symptom of the problem is the error message below:

include/linux/delay.h:39:[kernel] failure: invalid implicit conversion from void to int

We can build a script that tests for the presence of this line in Frama-C's output. This script will then tell the reducer whether the latter is still on the right path or accidentally erased the interesting behavior.

A first obvious remark is that the test should not check for the presence of “include/linux/delay.h:39:[kernel] failure: invalid implicit conversion from void to int” exactly. If it does, then a reduced program can only pass the test by making it look as if the problem is at that file and line, which will prevent removal of #line file directives in the testcase, and also hinder the removal of irrelevant lines that only seem useful because they happen to make the error message come from the expected line.

Instead, the test should be only the presence of “failure: invalid implicit conversion from void to int” in the output of Frama-C. We think we will be happy with any testcase that produces this output, so we should configure the reducer with a test that detects this output and this output only. This idea will be a recurring theme in this post.

C-Reduce is the best reducer of C programs out there. C-Reduce takes as input a C program to reduce and a script implementing the interestingness test. A pre-condition is that the initial program has to be interesting. In exchange, C-Reduce produces the smallest interesting program that it can, removing and moving around bits of the original program and checking that the result remains interesting. When applied with the aforementioned 1.5MiB file from the Linux kernel and with the interestingness test frama-c t.i | grep "failure: invalid implicit conversion…", C-Reduce produces the minimal file below:

fn1 ()
{
    0 ? 0 * 0 ? : 0 : 0;
}

This superficially looks like something that we could show to our Frama-C front-end expert, although we will generate a better reduced program later in the post. If we make the mistake of grepping for "delay.h:39:[[]kernel[]] failure: invalid implicit conversion…" instead, C-Reduce still does a good job, preserving only a single additional line that happens to cause the warning to be localized where the interestingness test expects it:

#37 "include/linux/delay.h"
fn1 ()
{
    0 ? 0 * 0 ? : 0 : 0;
}

C-Reduce's only quirk may be that it sometimes reduces too much. In both reduced programs above, the function fn1() is implicitly typed as returning int according to now obsolescent C90 rules. In addition, the syntax e1 ? : e2 is not standard C but a GCC extension that happens to be accepted by the Frama-C front-end.

We have been lucky

We have in fact been lucky in the above reduction: the reduced program looks like something that should be accepted by Frama-C (despite the int ellipsis and the strange ? : syntax), and instead of accepting it, Frama-C emits an error message. This is exactly what we were hoping to obtain from the reduction process.

Consider the eventuality of the reduced program being an invalid program, incorrectly converting a void expression to int. Frama-C would be behaving correctly by emitting its warning on the reduced program. C-Reduce would have behaved according to specification, but we would be left without anything to show our front-end expert. We would have been had.

This is not a problem in C-Reduce, but a problem with us not using it properly. The reason the original 1.5MiB program is interesting is that it is accepted by GCC and rejected by Frama-C. This is the criterion the interestingness test should implement. When the interestingness test passed to C-Reduce implements an approximation of the notion of interestingness we really desire, we have only ourselves to blame if C-Reduce turns up an invalid reduced file that Frama-C correctly rejects.

Stop reducing so much, C-Reduce!

We do not want the Frama-C front-end expert we report the bug to to be sidetracked by considerations on obsolete syntax or strange extensions. The best testcase to show em is a short but perfectly standard C program. We can make this preference part of our interestingness test.

The first idea may be to reject any file that is not strictly C99-compliant. That would be a file that gcc -std=c99 -pedantic accepts to compile, I think. However, the unreduced testcase here is a file from the Linux kernel. The unreduced testcase does not pass this test.

Delta debugging is for preserving interestingness along size-reduction steps, not for making interestingness appear. Therefore, we cannot demand a reduced file that passes gcc -std=c99 -pedantic if the original file to start from does not pass it either.

Note: techniques borrowed from C-Reduce and from genetic programming might make interestingness appear out of uninteresting C files. Structural coverage of the program under test, beyond the ability to crash it, may make a useful, non-boolean fitness function. Or better, for a chosen assertion inside the program under test, coverage of the statements that lead to the assertion. This process would be different from delta debugging but might generate files that reveal bugs in static analysis frameworks. I hope someone will look into it someday.

However, simply compiling the tentative reduced file we already have with well-chosen compiler options reveals that GCC can already tell this file is not perfect. It emits the warnings snippets forbids omitting the middle term (regarding the syntax extension for ? :) and warning: return type defaults to (regarding implicit return types of functions). Our original file, although it originates from the Linux kernel, is not so ugly that it causes these warnings. Consequently, we can test for these warnings in an interestingness test that accepts the original file and will guide C-Reduce towards a clean reduced file.

If we do this, we obtain:

/*  */ void
fn1 ()
{
    0 ? 0 * 0 ? 0 : 0 : 0;
}

The Frama-C issue at hand was reported using the reduced C program as evidence, and was promptly fixed by Virgile Prevosto, Frama-C front-end expert extraordinaire.

C-Reduce is now parallel

The latest version of C-Reduce can explore several possible reductions in parallel. This is particularly interesting when the program under test takes a long time. A bug deep into the value analysis plug-in may take minutes to occur in the original program, and C-Reduce typically launches thousands of instances in the process of reducing a large program to a small one.

However, it does not pay to run too many instances of memory-bandwidth-limited programs in parallel. The computer I ran this example on has 4 cores and two execution threads by core (it relies on hyper-threading). The system thus sees 8 processors. By default, C-Reduce launches up to 8 instances in parallel. Is this the best choice?

If I use option C-Reduce's option -n to choose different levels of parallelism instead, I obtain, for this particular memory-intensive(*) program being debugged (Frama-C) and this particular initial C program, the following timings:

# instances    Time (s)
 1               620
 2               375
 3               346
 4               367
 5               396
 8               492

For this particular use, C-Reduce's default of using all available execution threads is faster than using only one execution thread. It is also slower than all other parallelization choices. The fastest reduction is obtained for 3 parallel instances, and it is almost twice faster than the non-parallel version.

There is some randomness to the reduction algorithm, however, and the experiment should be repeated with varying initial programs before conclusions are drawn.

(*) Frama-C can sometimes use as much as half the memory a modern web navigator typically uses. Imagine!

Making each instance terminate earlier

Even with an optimal choice of the degree of parallelism, automatic reduction can be time-consuming. A sub-optimal interestingness test for a bug that takes several minutes to appear in Frama-C can make the entire reduction take days, or exhaust the operator's patience altogether. This is only machine time we are talking about, but sometimes a human is waiting on the result of the reduction. In this section, we point out two useful tricks to bear in mind, especially when the program under test can take a long time for some inputs. The fact that the program under test takes a long time can be the problem being investigated, but does not have to.

First trick: using grep option -l

Often, as soon as a particular diagnostic message has been emitted, it is possible to conclude that the testcase is interesting. In the example above, as soon as the message “failure: invalid implicit conversion…” has been seen in Frama-C's output for a C program that GCC otherwise accepts, the input C program is known to be interesting.

By default, a shell script such as frama-c … test.c | grep "interesting message" allows Frama-C to finish its analysis before returning a boolean value. A simple improvement is to use grep's -l option, that causes grep to exit just after the matching line has been seen.

Contrast these two Unix sessions:

$ cat | grep interesting ; echo $?
nothing to see here
still nothing
this is interesting
this is interesting     ← this matching line is repeated by grep
one more line
two more lines
three more lines
cat is taking a long time
...
^D
0   ← error code of grep, printed by echo, indicating an interesting line was seen

In the above session, the grep command was still scanning its input after an interesting line had been seen, until I indicated that the “analysis” was finished with control-D. Below, I repeat the experiment, this time using grep -l to detect the keyword “interesting”.

$ cat | grep -l interesting ; echo $?
nothing to see here
still nothing
this is interesting
(standard input)     ← grep indicates a matching line has been seen on stdin
one more line
0   ← error code of grep, printed by echo, indicating an interesting line was seen

Thanks to the -l option, the cat command was terminated abruptly before it had finished, when emitting the line immediately after the interesting line (cat was terminated for trying to write to a closed file descriptor). Actually, this is a slight chink in the armor: the program whose output is piped into grep needs to write one more line after the interesting line in order to be terminated. If the interesting behavior is Frama-C taking a long time for some inputs, and the interesting log line is one that says “warning: things are going to take an infinite time from now on”, and nothing is ever printed after that line, Frama-C will not be terminated. My Unix-fu is too weak for me to offer a solution where the program under test is terminated as soon as possible, but the next subsection offers a palliative anyway.

Second trick: using a timeout

The initial input program may fail (i.e. “be interesting”) within, say, 5 minutes. This does not prevent slight variations of the program to take forever to fail, or to take forever to succeed. In each case, we do not want the reduction process to waste days of computations testing a single variation. We want the reduction process instead to try as many variations as possible quickly in order to converge to a short one as soon as possible. The strength of delta debugging is in iteration.

A simple idea here is to define the interestingness property as “the C program causes the interesting message to be emitted by Frama-C within 10 minutes”. Any program that has already been analyzed for 11 minutes is uninteresting by this definition. This property can be implemented simply with the Unix command ulimit.

Strictly speaking, the interestingness property should be completely reproducible, whereas this one is not. However, C-Reduce still works very well in practice with such a limit in place in the interestingness test. To limit the risk of breaking internal C-Reduce assertions, the time the initial program takes to be interesting should be multiplied by a safety factor. Because C-Reduce launches several instances in parallel, each instance runs a bit slower than if it was alone, and execution times can vary a bit. I have never had any trouble using as the timeout twice the time the initial program takes to fail.

Automating the manual

Let us take a step back and look at what we have been doing above. We started with one obvious interesting property, and we progressively refined it in order to delineate the really interesting programs. We initially thought we were interested in C programs that cause Frama-C to emit the message “include/linux/delay.h:39:[kernel] failure: invalid implicit conversion from void to int”, but it eventually turned out that we really wanted Frama-C to emit “failure: invalid implicit conversion from void to int” within 10 minutes for a C program that GCC compiles without emitting warnings about ? : syntax extensions or implicit return types.

We have been obtaining the additional sub-properties in a trial-and-error fashion: we did not know that we wanted any of them until C-Reduce produced a program that did not have it. Still, I believe that:

  1. These additional sub-properties would always come from the same repertoire in actual C-Reduce practice. It is possible to build a large catalog of these desirable properties so that the end user does not need to re-discover each of them.
  2. Before the reduction per se even starts, the original program should be tested for all the above niceness properties. All the niceness properties the original program has should be preserved in the reduced program by making them part of the interestingness test. That is, if the original program does not warn about implicit return types, the reduced program should not warn about implicit return types. On the other hand, if the original program does warn about implicit return types, then the bug being investigated may be about implicit return types, so it is not possible to require the reduced program to avoid the warning.
  3. Still, it would be a useful feature if the reducer could detect that a niceness property has appeared during reduction, and strove to preserve it from there on. This was perhaps easy to implement in the early non-parallel version of C-Reduce: it looks like it could be done entirely by side-effects in the interestingness test. Since, C-Reduce has been parallelized, which greatly helps with reduction speed but means that implementing this feature becomes more complicated, if it is at all desirable.

Conclusion

This post collects a number of useful tricks for using C-Reduce, in particular when reducing bugs for Frama-C. C-Reduce is already immensely useful as it is, but this post also indicates some future work directions to make its use more automatic and more convenient yet.

Saturday, October 26 2013

Jakob Engblom on the Toyota Acceleration Case

We interrupt our regularly scheduled program to link to this post by Jakob Engblom. The post was prompted by the jury's conclusion, in a lawsuit over what appears to be an automotive software glitch, that the carmaker is liable.

Monday, October 21 2013

Bruce Dawson on compiler bugs

Bruce Dawson has written a superb blog post on a Visual C++ compiler bug (now fixed), covering every aspect an essay on compiler bugs should cover.

I really like one section, that I am going to quote in full:

Security

In these paranoid days of the NSA subverting every computer system available every bug is a possible security flaw. This bug seems curiously commonplace – why wasn’t it discovered when building Windows or other 64-bit products? The most likely explanation is chance and happenstance, but compiler bugs can be used to make innocuous source code do surprising things. Maybe this bug is used as part of a back door to let code execution go in a direction that the source code says is impossible. This happened accidentally on the Xbox 360 where the low 32 bits of r0 were checked and then all 64 bits were used. This inconsistency allowed a hypervisor privilege escalation that led to arbitrary code execution.

This bug is probably not security related, but security is another reason to ensure that compiler bugs get fixed.


Bruce's post also illustrates the “most compiler bugs aren't” problem and provides a step-by-step tutorial on manual bug reduction. Regarding the latter, my reaction to Bruce's post was to bring up automatic testcase reduction with “delta debugging”, but Gravatar user jrrr beat me to it.

Frama-C contributes to making automatic testcase reduction work in practice for C compiler bugs by providing a reference C implementation that detects the undefined behaviors that could make the (original or reduced) bug reports invalid. We also use delta debugging internally to simplify our own reports of bugs in Frama-C. The next post in this blog will illustrate this use of delta debugging on a concrete example.

Friday, October 18 2013

OCaml's option -compact can optimize for code size and speed

The OCaml compiler can generate native code for various architectures. One of the few code generation options is named -compact:

$ ocamlopt -help
Usage: ocamlopt <options> <files>
Options are:
  ...
  -compact  Optimize code size rather than speed
  ...

One of the effects, perhaps the only effect(?), of this option is on the code generated for allocation. In OCaml, a small block typically gets allocated from the minor heap. When all goes well, allocating in the minor heap means simply decrementing the pointer that separates the already-allocated top of the minor heap from the still-available bottom of the minor heap. Eventually the minor heap becomes full; what happens then is complicated. But the fast path is basically a subtraction and a conditional branch (usually not taken) to the complicated stuff.

With option -compact, an allocation simply calls a routine that does all the above. Without it, the subtraction and conditional jump are inlined at the site where the allocation is requested. OCaml being a typical functional language, there are plenty of these. Inlining a couple of instructions makes the code bigger, but when execution stays on the fast path, which is often, a routine call is avoided completely.

A small example

Consider the one-line OCaml function let f i = Some i. This function takes a value i and returns a Some memory cell that points to i. I intended i to be an integer, but I inadvertently wrote code that works for any type of i. This was possible because in OCaml, all types of values share a uniform representation, as discussed in the introduction of a previous post.

A Some cell occupies 16 bytes (yes, this is huge. Blame 64-bit computing). The fast code generated by ocamlopt -S t.ml, where L102 is the label at which selcom-called code manages the case when the minor heap is full:

	...
	subq	$16, %r15
	movq	_caml_young_limit@GOTPCREL(%rip), %rax
	cmpq	(%rax), %r15
	jb	L102
	...

The -compact option causes more compact code to be emitted for the allocation. When our function f is compiled with ocamlopt -S -compact t.ml, the sequence generated for the allocation is much shorter indeed. There is even a specialized allocation function for allocating 16-byte blocs, so no arguments need to be set up:

	...
	call	_caml_alloc1
	...

Effects of option -compact in the large

When the development version of Frama-C is compiled with option -compact, the binary takes 14.4MB.

Without option -compact, the Frama-C binary weights in at 15.4MB.


One additional megabyte of code is not going to hurt us at this point, right? It is not as if the Frama-C binary had to fit exactly in a box of ten floppy disks. As long as the larger code is faster, it is worth it…

$ time  bin/toplevel.lean_and_mean tests/test/adpcm.c -sparecode > /dev/null

user	0m0.843s

$ time  bin/toplevel.big_and_slow tests/test/adpcm.c -sparecode > /dev/null

user	0m0.856s

… but it turns out that the larger code isn't faster. In the case of Frama-C, and of the computer I am typing on, at least, the code generated with option -compact is smaller and faster.

Conclusion

It is not easy to try to predict performance on modern out-of-order architectures, but if I had to hazard an explanation, I would say that, besides the obvious cache concerns, having many scattered predictable conditional branches may be a losing proposition compared to one single conditional branch visited from many places with call and ret. Every conditional branch that isn't in the branch prediction buffer is at risk of being mispredicted. In addition, each entry taken by the branches that have recently been visited is an entry that isn't available for branches translated from conditionals in the source code.

In contrast, the execution of the call and ret instructions has been optimized over the years. The target of the ret instruction is predicted with a Return Stack Buffer that has a conceptually simpler job than the Branch Predictor (although it is clearly possible to write code for which the ret instructions cause a stall, which could re-establish an advantage for the partially inlined version).


The conclusion is that everyone who uses OCaml should try the native compiler's option -compact, and that perhaps the default should be for this option to be on.

Wednesday, October 9 2013

The overflow when converting from float to integer is undefined behavior

Integer overflows in C

A previous post on this blog was a reminder that in C, signed integer arithmetic overflow is undefined behavior. In contrast, the behavior of overflows in conversions from integer type to signed integer type is implementation-defined. The C99 standard allows for an implementation-defined signal to be raised, but in practice, the widespread compilation platforms provide two's complement behavior. And you can trust that they will continue to do so, because it's implementation-defined. Compiler makers cannot change their mind willy-nilly as if it was undefined behavior:

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Floating-point overflows in C

The C standard does not mandate IEEE 754 floating-point arithmetic. Still, in practice, modern compilation platforms, if they provide floating-point features at all, provide either exactly IEEE 754 binary32 and binary64 formats and computations, or the same formats and a close approximation of the same computations.

IEEE 754 floating-point defines +inf and -inf values, so that any real number can be approximated in the target IEEE 754 format (albeit, when it ends up represented as an infinity, not precisely). This means that for C compilation platforms that implement IEEE 754 for floating-point, the condition “the value can be represented in the new type” is always true. There is no reason to worry of undefined behavior caused by overflow in either floating-point arithmetic or in the conversion of a double to a float.

Or indeed, in a constant. Consider GCC's warning here:

$ cat t.c
#include <stdio.h>

int main()
{
  double big = 0x1.0p5000;
  printf("%f\n", big);
}

$ gcc-172652/bin/gcc -std=c99 -Wall t.c && ./a.out 
t.c: In function ‘main’:
t.c:5:3: warning: floating constant exceeds range of ‘double’ [-Woverflow]
inf

The number 2^5000, represented in C as 0x1.0p5000, is totally in the range of double, which goes up to inf. Clang similarly warns that “magnitude of floating-point constant too large for type double”. A proper warning message would be that 2^5000 cannot be represented precisely, instead of implying that it cannot be represented at all.

Floating-point ↔ integer conversion overflows in C

But enough pedantry contests with compilers. The range of floating-point representations being what it is, we are left with only overflows in conversions from floating-point to integer to consider.


Suspense… (for the reader who did not pay attention to the title)


Overflows in conversions from floating-point to integer are undefined behavior. Clause 6.3.1.4 in the C99 standard make them so:

6.3.1.4 Real floating and integer

1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

What can happen in practice when a C program invokes this particular flavor of undefined behavior? It is as bad as dereferencing an invalid address, mostly harmless like signed integer arithmetic overflow, or what? Let us find out.

The following program converts the double representation of 2^31, the smallest positive integer that does not fit a 32-bit int, to int.

int printf(const char *, ...);

int main()
{
  int i = 0x1.0p31;
  printf("%d\n", i);
}

Frama-C's value analysis warns about undefined behavior in this program:

$ frama-c -val t.c

warning: overflow in conversion of 0x1.0p31 (2147483648.) 
   from floating-point to integer.
   assert -2147483649 < 0x1.0p31 < 2147483648;

Fine-tuning the assertion -2147483649 < 0x1.0p31 < 2147483648 was a riot, by the way. Do you see why?

My aging (but still valiant) PowerPC-based Mac appears to think that saturation is the way to go: the variable i is set to INT_MAX:

$ gcc -std=c99 t.c && ./a.out 
2147483647

Dillon Pariente was first to draw our attention to overflow in floating-point-to-integer conversions, which caused CPU exceptions on the target CPU for the code he was analyzing. I understood that target CPU to also be a PowerPC, so I suspect the behavior must be configurable on that architecture.

Dillon Pariente's example was along the lines of float f = INT_MAX; int i = f; which is also hilarious if you are into that sort of humor.


In order to really show how weird things can get on Intel processors, I need to modify the test program a bit:

int printf(const char *, ...);

volatile double v = 0;

int main()
{
  int i1 = 0x1.0p31;
  int i2 = 0x1.0p31 + v;
  printf("%d %d\n", i1, i2);
}

The volatile type qualifier precludes optimization, but there is no hardware or thread to change the value of variable v. The two expressions 0x1.0p31 and 0x1.0p31 + v are both expressions of type double that evaluate to 2^31.

Still GCC and Clang, like a single compiler, think that these two expressions needn't result in the same value when converted to int:

$ gcc t.c && ./a.out 
2147483647 -2147483648
$ clang  t.c && ./a.out 
2147483647 -2147483648

The results are different because one conversion was evaluated statically to be placed in %esi (2147483647) whereas the other was evaluated at run-time in %edx with the cvttsd2si instruction:

$ clang -S -O t.c  && cat t.s
...
_main:                                  ## @main
...
	movsd	_v(%rip), %xmm0
	addsd	LCPI0_0(%rip), %xmm0
	cvttsd2si	%xmm0, %edx
	leaq	L_.str(%rip), %rdi
	movl	$2147483647, %esi       ## imm = 0x7FFFFFFF
	xorb	%al, %al
	callq	_printf
...
L_.str:                                 ## @.str
	.asciz	 "%d %d\n"

Only undefined behavior allows GCC and Clang to produce different values for i1 and i2 here: the values of these two variables are computed by applying the same conversion to the same original double number, and should be identical if the program was defined.


Generally speaking, cvttsd2si always produces -0x80000000 in cases of overflow. That is almost like saturation, except that floating-point numbers that are too positive are wrapped to INT_MIN. One may think of it as saturating to either -0x80000000 or 0x80000000, and in the latter case, wrapping around to -0x80000000 because of two's complement. I do not know whether this rationale bears any resemblance to the one Intel's engineers used to justify their choice.


So one might think that this is the end of the story: as long as the conversion is done at run-time on an Intel platform, the compiler uses the cvttsd2si instruction. Overflows, if overflows there are, “saturate to INT_MIN” as the convention is on this platform. This can be confirmed experimentally with the following program variant:

#include <stdio.h>
#include <stdlib.h>

int main(int c, char **v)
{
  int i = 0x1.0p31 + strtod(v[1], 0);
  printf("%d\n", i);
}

This new program takes a number from the command-line and adds it to 2^31, so that there is no opportunity for compile-time evaluation. We expect the conversion to saturate to INT_MIN, and it does:

$ gcc -std=c99 t.c && ./a.out 1234 && ./a.out 12345 && ./a.out 123456
-2147483648
-2147483648
-2147483648


Wait! It gets more amusing still. Let us change the program imperceptibly:

int main(int c, char **v)
{
  unsigned int i = 0x1.0p32 + strtod(v[1], 0);
  printf("%u\n", i);
}

The behavior of run-time overflow in the conversion from double to integer changes completely:

$ gcc -m64 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999 
1234
123456
3755744407

But conversion saturates again, at zero this time, for the same program, when targeting IA-32:

$ gcc -m32 -std=c99 t.c && ./a.out 1234 && ./a.out 123456 && ./a.out 12345678999
0
0
0

Do you have an explanation for this one? Leave a message in the comments section below. The fastest author of a complete explanation wins a static analyzer license.

Conclusion

In conclusion, the overflow in the conversion from floating-point to integer is rather on the nasty side of C's undefined behavior spectrum. It may appear to behave consistently if the compilation targets an architecture where the underlying assembly instruction(s) saturate. Saturation is the behavior that compilers GCC and Clang implement when they are able to evaluate the conversion at compile-time. In these conditions, a lucky programmer may not actually observe anything strange.

The idiosyncrasies of other architectures may lead to very different results for overflowing conversions depending on parameters outside the programmer's control (constant propagation, for instance, is more or less efficient depending on the optimization level and may be difficult to predict, as we already complained about when discussing Clang targeting the 387 FPU).


Acknowledgements: In addition to Dillon Pariente, I discussed this topic with Boris Yakobowski, John Regehr, Stephen Canon, and StackOverflow users tenos, Sander De Dycker and Mike Seymour prior to writing this blog post.

Wednesday, September 25 2013

The problem with differential testing is that at least one of the compilers must get it right

A long time ago, John Regehr wrote a blog post about a 3-3 split vote that occurred while he was finding bugs in C compilers through differential testing. John could have included Frama-C's value analysis in his set of C implementations, and then the vote would have been 4-3 for the correct interpretation (Frama-C's value analysis predicts the correct value on the particular C program that was the subject of the post). But self-congratulatory remarks are not the subject of today's post. Non-split votes in differential testing where all compilers get it wrong are.

A simple program to find double-rounding examples

The program below looks for examples of harmful double-rounding in floating-point multiplication. Harmful double-rounding occurs when the result of the multiplication of two double operands differs between the double-precision multiplication (the result is rounded directly to what fits the double format) and the extended-double multiplication (the mathematical result of multiplying two double numbers may not be representable exactly even with extended-double precision, so it is rounded to extended-double, and then rounded again to double, which changes the result).

$ cat dr.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>
#include <limits.h>

int main(){
  printf("%d %a %La\n", FLT_EVAL_METHOD, DBL_MAX, LDBL_MAX);
  while(1){
    double d1 = ((unsigned long)rand()<<32) +
                           ((unsigned long)rand()<<16) + rand() ;
    double d2 = ((unsigned long)rand()<<32) +
                           ((unsigned long)rand()<<16) + rand() ;
    long double ld1 = d1;
    long double ld2 = d2;
    
    if (d1 * d2 != (double)(ld1 * ld2))
      printf("%a*%a=%a but (double)((long double) %a * %a))=%a\n", 
	     d1, d2, d1*d2,
	     d1, d2, (double)(ld1 * ld2));
  }
}

The program is platform-dependent, but if it starts printing something like below, then a long list of double-rounding examples should immediately follow:

0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380

Results

In my case, what happened was:

$ gcc -v
Using built-in specs.
Target: i686-apple-darwin11
...
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
$ gcc -std=c99 -O2 -Wall dr.c && ./a.out 
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
^C

I immediately blamed myself for miscalculating the probability of easily finding such examples, getting a conversion wrong, or following while (1) with a semicolon. But it turned out I had not done any of those things. I turned to Clang for a second opinion:

$ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
$ clang -std=c99 -O2 -Wall dr.c && ./a.out 
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
^C

Conclusion

It became clear what had happened when looking at the assembly code:

$ clang -std=c99 -O2 -Wall -S dr.c && cat dr.s
...
	mulsd	%xmm4, %xmm5
	ucomisd	%xmm5, %xmm5
	jnp	LBB0_1
...

Clang had compiled the test for deciding whether to call printf() into if (xmm5 != xmm5) for some register xmm5.

$ gcc -std=c99 -O2 -Wall -S dr.c && cat dr.s
...
	mulsd	%xmm1, %xmm2
	ucomisd	%xmm2, %xmm2
	jnp	LBB1_1
...

And GCC had done the same. Although, to be fair, the two compilers appear to be using LLVM as back-end, so this could be the result of a single bug. But this would remove all the salt of the anecdote, so let us hope it isn't.


It is high time that someone used fuzz-testing to debug floating-point arithmetic in compilers. Hopefully one compiler will get it right sometimes and we can work from there.

Monday, September 2 2013

The case for formal verification of existing software

Perry E. Metzger takes a look at formal verification. This is good stuff; there is a lot to agree with here.

However, agreeing with Perry's post alone would not make a very interesting counterpoint. If agreeing was the only thing I intended to do, I might even not have written this post. Instead I intended to add, and I apologize in advance for the predictability of my views, that while creating formally verified software from scratch is useful, verifying existing software is useful too.

Yes, formally verified software written from scratch can now be large enough in scope to be a compiler or a microkernel, but when verifying existing software, we can tackle the problem from the other end: we can pick any useful software component, and verify that. We can pick a software component so useful that it is already used by millions. If we succeed in verifying it, we have put formally verified software in the hands of millions of satisfied users. Transparently.


Take the example of the SSL implementation I am taking a couple of weeks to finish massaging through Frama-C. It is not as wide in scope as Quark, seL4 or CompCert. Neither am I aiming for the same kind of functional correctness as these projects are: I am only verifying the absence of undefined behaviors in the component, and verifying the functional dependencies of the cryptographic primitives(*).

But PolarSSL is useful. Plus, by its very nature, it is exposed to security attacks (SSL is the protocol behind HTTPS). And the former three examples are full-blown research projects, in contrast to my single person.month effort.

The bonus is that the formally verified PolarSSL can be compiled and embedded with all the same compilers and into all the same firmwares as the earlier non-verified version. It is the same as the non-verified version except maybe for a couple of bugfixes, and the confidence that for an identified usage pattern, no more undefined behavior bugs will ever need to be fixed.

All in all, the formal verification of existing code, despite its differences from its “from scratch” counterpart, has too become a force to be reckoned with.


(*) translation: I use Frama-C option -deps and I compare the result to what I expected


Acknowledgement: I got the link to Perry E. Metzger's post through Toby Murray.

Saturday, August 24 2013

Function pointers in C

This post contains a complete list of everything a C program can do with a function pointer, for a rather reasonable definition of “do”. Examples of things not to do with a function pointer are also provided. That list, in contrast, is in no way exhaustive.

What a C program can do with a function pointer

Convert it to a different function pointer type

A function pointer can be converted to a different function pointer type. The C99 standard's clause 6.3.2.3:8 starts:

“A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer.”

Call the pointed function with the original type

Clause 6.3.2.3:8 continues:

“If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.”

Alright, so the above title is slightly sensationalistic: the pointed function can be called with a compatible type. After typedef int t;, the types t and int are compatible, and so are t (*)(t) and int (*)(int), the types of functions taking a t and returning a t and of functions taking an int and returning an int, respectively.

There is no third thing a C program can do with a function pointer

Seriously. The C99 standard has uintptr_t, a recommended integer type to convert data pointers to, but there is not even an equivalent integer type to store function pointers.

What a C program cannot do with a function pointer

Convert it to an ordinary pointer

Function pointers should not be converted to char * or void *, both of which are intended for pointers to data (“objects” in the vocabulary of the C standard). Historically, there has been plenty of reasons why pointers to functions and pointers to data might not have the same representation. With 64-bit architectures, the same reasons continue to apply nowadays.

Call the pointed function with an incompatible type

Even if you know that type float is 32-bit, the same as int on your platform, the following is undefined:

void f(int x);

int main(){
  void (*p)(float) = f;
  (*p)(3);
}

The line void (*p)(float) = f;, which defines a variable p of type “pointer to function that takes a float”, and initializes it with the conversion of f, is legal as per 6.3.2.3:8. However, the following statement, (*p)(3); is actually equivalent to (*p)((float)3);, because the type of p is used to decide how to convert the argument prior to the call, and it is undefined because p points to a function that requires an int as argument.


Even if you know that the types int and long are both 32-bit and virtually indistinguishable on your platform (you may be using an ILP32 or an IL32P64 platform), the types int and long are not compatible. Josh Haberman has written a nice essay on this precise topic.

Consider the program:

void f(int x);

int main(){
  void (*p)(long) = f;
  (*p)(3);
}

This time the statement is equivalent to (*p)((long)3);, and it is undefined, even if long and int are both 32-bit (substitute long and long long if you have a typical I32LP64 platform).


Lastly, the example that prompted this post was, in a bit of Open-Source code, the creation of a new execution thread. The example can be simplified into:

void apply(void (*f)(void*), void *arg)
{
  f(arg);
}

void fun(int *x){
  // work work
  *x = 1;
}

int data;

int main(){
  apply(fun, &data);
}

The undefined behavior is not visible: it takes place inside function apply(), which is a standard library function (it was pthread_create() in the original example). But it is there: the function apply() expects a pointer to function that takes a void* and applies it as such. The types int * and void * are not compatible, and neither are the types of functions that take these arguments.

Note that gcc -Wall warns about the conversion when passing fun to apply():

t.c:11: warning: passing argument 1 of ‘apply’ from incompatible pointer type

Fixing this warning with a cast to void (*)(void*) is a programmer mistake. The bug indicated by the warning is that there is a risk that fun() will be applied with the wrong type, and this warning is justified here, since fun() will be applied with the wrong type inside function apply(). If we “fix” the program this way:

$ tail -3 t.c
int main(){
  apply((void (*)(void*))fun, &data);
}
$ gcc -std=c99 -Wall t.c
$ 

The explicit cast to (void (*)(void*) silences the compiler, but the bug is still in the same place, in function apply().


Fortunately gcc -std=c99 -Wall is not the only static analyzer we can rely on. Frama-C's value analysis warns where the problem really is, in function apply(), and it warns for both the version with implicit conversion and the version with explicit cast:

$ frama-c -val t.c
…
[value] computing for function apply <- main.
        Called from t.c:14.
t.c:3:[value] warning: Function pointer and pointed function 'fun'  have incompatible types:
        void (void *) vs. void (int *x). assert(function type matches)

The correct way to use function apply() without changing it is to make a function with the correct type for it, and to pass that function to apply():

void stub(void *x){
  fun(x);  
}
…
  apply(stub, &data);

Note that in the above, x is implicitly converted when passed to function fun(), the same way that &data is implicitly converted to void* when passed to apply().

Conclusion

There is almost nothing you can do in C with a function pointer. The feature is still very useful and instills a bit of genericity in an otherwise decidedly low-level language.

Function pointers are not often used in the standard library, considering: qsort() is, with pthread_create(), another of the few functions that requires a function pointer. Like it, it is often misused: it has its own entry in the C FAQ.


Jens Gustedt provided advice in the writing of this post.

Monday, August 5 2013

Exact case management in floating-point library functions

The documentation of the correctly-rounded CRlibm floating-point library states, for the difficult pow() function (p. 159):

Directed rounding requires additional work, in particular in subnormal handling and in exact case management. There are more exact cases in directed rounding modes, therefore the performance should also be inferior.

The phrase “exact case” refers to inputs that need to be treated specially because no number of “Ziv iterations”, at increasing precisions, can ever resolve which way the rounding should go.


Quiz: Isn't an exact case an exact case independently of the rounding mode? How can exact cases vary with the rounding mode?


If you can answer the above quiz without having to rubber duck through the entire question on an internet forum, you have me beat.

- page 2 of 23 -