Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - csmith

Entries feed - Comments feed

Thursday, March 29 2012

Why do signed overflows so often do what programmers expect?

Semi-serious musings

During the Frama-C random testing experiment described at length on this blog and this page, we found a few bugs in Csmith, too. John Regehr, one of the Csmith developers and, not entirely coincidentally, a co-author of the article linked in the previous post, is also a co-author in the NFM12 article where the random testing of Frama-C is described. Some time ago, while, in the context of the latter article, discussing the fact that Csmith had been used intensively, finding 400+ bugs in compilers, without the bugs that Frama-C uncovered being an annoyance or even noticed, he said:

It would also maybe be useful to remark that the Csmith bugs had not been detected because compilers are pretty weak at exploiting undefined behaviors (though they do this sometimes).


I do not agree about the choice of the adjective “weak”. I find compilers are pretty good at what they are supposed to do, which is apparently to ignore undefined behaviors in the program and generate the shortest and/or fastest sequences of instructions that compute the correct result for all defined executions. Sometimes, an opportunity arises to exploit the fact that some inputs cause undefined behavior to shorten/fasten the sequence of instructions. But in the general case, with industry-standard instruction sets, there just isn't any such opportunity. The shortest, fastest sequence that works for all defined inputs is the sequence that happens to give 2's complement semantics to signed overflows. I think that this is a fundamental fact of current instruction sets, and considering the rate at which these change, I do not expect programmers misunderstandings about signed overflows to change in the near or far future. Even with much progress in compilation techniques, signed overflow will continue to give 2's complement results most of the times, and programmers will continue to be surprised when it doesn't.

And now for some more Regehr links

This blog post shows how good compilers already are.

I put forward a weak version of the argument presented here on the Frama-C mailing list a long time ago, and John answered with an example C program where the compiler can take advantage of the undefinedness of signed overflows to generate code that does not give 2's complement results.

John also discussed undefined behaviors on his blog, which prompted Chris Lattner to write a 3-post series with more of the compiler implementer's point of view and revealing examples.

Monday, January 16 2012

Csmith testing again

My presentation Friday at the U3CAT meeting was on the topic of Frama-C Csmith testing. Several posts in this blog already describe facets of this work (it has its own tag). Yet another angle can be found in this short article draft. Said draft, by the way, will soon need to be either finalized or resubmitted, so remarks are welcome if you have them. This webpage provides figures and implementation details that could have been included in the article to improve clarity, but weren't because we were aiming for the 6-page short format. The webpage is not intended to make sense separately from the article.


January 25, 2012: Updated with most recent draft of the article. The companion webpage has been updated, too

Sunday, December 4 2011

Explaining why Csmith matters even more than previously anticipated

Csmith as a static analyzer fuzzer

A new version of Csmith, the generator of random defined C programs, was released a few days ago. This is the version that many functions in Frama-C Nitrogen were debugged against. Conversely, a few bugs in the development versions of Csmith, characterized by programs being generated that were not defined (although they compiled and ran fine), were found by pre-Nitrogen development versions of Frama-C and are fixed in the 2.1.0 Csmith release.

The original intention behind Csmith was to find bugs in compilers. As I pointed out in earlier posts, Csmith can also find bugs in static analysis tools.

Here is the latest bug found by Csmith in Frama-C. To explain: ACSL properties, such as alarms emitted by the value analysis, are attached to the beginning of statements. When the danger is in the implicit cast in the assignment of a function's result to an lvalue, the AST may have been normalized in such a way there is no good place to attach the corresponding alarm: before the assignment-cum-function-call, it is too early to express properties about the function's result, and after it, it is too late. This Frama-C bug was found last because it cannot be found by using the value analysis as a C interpreter on defined programs, my initial testing method. In order to find this bug, alarms must be emitted during the analysis, and there aren't any when interpreting a defined program. The bug was found with a different testing method, where the value analysis is not used as an interpreter, makes approximations, and therefore emit alarms. A simple workaround for this bug is to normalize the program differently, using the pre-existing option -no-collapse-call-cast.


One shouldn't base a research project on another research project

I was Benjamin C. Pierce's intern for a couple of months as an undergraduate. The title of this section is one of the insights he gave me then that stuck. In context, he probably meant the obvious fact that you do not want to spend your time running into bug after bug in the prototypal implementation left from the initial research project. Another, more insidious way this warning holds is that even if the ideas from the first project have a solid implementation, you may encounter a problem finding a public to tell about what you've done.


I find myself in a position where I would like to tell people that Csmith is even more useful than previously anticipated. It can not only find bugs in compilers, but also, with some effort, in static analyzers. Isn't that important? (alright, perhaps I am biased here)

Besides, even when there is no bug to find, Csmith can point out differences in assumptions between the static analyzer and the compiler used to compile safety-critical code. One thing I discovered using Csmith was, for instance, that you shouldn't expect different compilers to handle zero-width bitfields in packed structs the same way. None of the compilers I tried was wrong, they were just using different conventions. Frama-C implements one of the conventions. I have simply added this combination of constructs to the list of features to check either that the analyzed program does not use, or that the target compiler implements exactly the same way as Frama-C.

Sadly, Csmith authors think that generating programs that reveal these differences detracts from Csmith's original goals. When Csmith is used as originally intended, compilers are compared to one another. If they do not give the same result, one of them must be wrong. So they removed these combinations from the generated programs as I was reporting them (and they probably removed many more on their own that would have been interesting in the context of verifying that a static analyzer is well-matched with a specific compiler).

An experience report looking for a public

That Csmith is even more useful than anticipated is only the (luckily positive) conclusion of the experiment. Building up to this conclusion, there are the methods that were used in order to test various Frama-C functions, the categorization of bugs that were found in both Frama-C and Csmith, and perhaps a couple of interesting examples of these. This could fit well in the "experience report" format that some conferences have a tag for in their calls for papers. But I have had little success digging up a conference where submitting would not be a waste of time for me and for the future reviewers.

A Frama-C workshop or a Csmith workshop, if either one existed, would be a perfect fit. One half of the contents would make the report fully on-topic while the other half would give it a touch of exoticism. Unfortunately, both are research tools without the users to justify their own event. John Regehr presented Csmith as the GCC Summit, and I guess this might be a possibility, but this potential report is at most on-topic by association there. On the static analysis side, I do not see too many people being interested in the minutiae of C's pitfalls. One impression I got at ICPC and already commented on in this blog was that the "It's possible to do things on Java, so let's study Java" school of thought has more proponents than the "There are large codebases in C, let's see what we can do about C" school of thought. Even among the latter, I am a bit worried such a report would be perceived as dealing with the details that other researchers have already interiorized as unimportant when they decided to ignore them in their own work.


I am not complaining. In a world in which either Frama-C or Csmith, or comparable tools, were widely used, someone would already have done this experiment before me. I might even have had to read a book discussing the proper way to do this sort of thing before I started. Truly, each researcher probably finds emself in the same situation at one point or another. Experimented researchers are those who have learnt to anticipate that a possible line of work, even if it gives useful results, will give results that are hard to publish.

I am lucky in that "researcher" is only half my job description. Inventing methods to test Frama-C with Csmith as we were going along was much fun, and fixing the bugs was the first and foremost objective.

Friday, October 21 2011

How to waste a Friday evening

Here is a quick recipe for completely wasting a Friday evening:

  1. use Csmith to generate a program that GCC happens to mis-compile in 32-bit into an executable that produces the same result as the correct 64-bit compilation of the same program;
  2. mess up the Clang double-check by forgetting the -m32 option so that it looks that GCC and Clang both agree there is a bug in the value analysis (which produces a result different from the consensus);
  3. look for bug that isn't there;
  4. goto 3

Monday, September 26 2011

New alarm category

There won't be many changes in the value analysis' documentation in Nitrogen. For lack of time, the new options, some of which were alluded to in this blog, will remain in their "to be documented" state. But documenting them more fully can always be done here, too, once they are released.

The one big change in the value analysis manual follows. It is a section describing a new kind of alarm, and inserts itself in section 3.2.2, "Log messages emitted by the value analysis — Proof obligations".

Partially overlapping lvalue assignment

Vaguely related to, but different from, undefined side-effects in expressions, the value analysis warns about the following program:

struct S { int a; int b; int c; };

struct T { int p; struct S s; };

union U { struct S s; struct T t; } u;

void copy(struct S *p, struct S *q)
{
  *p = *q;
}

int main(int c, char **v){
  u.s.b = 1;
  copy(&u.t.s, &u.s);
  return u.t.s.a + u.t.s.b + u.t.s.c;
}

The programmer thought ey was invoking implementation-defined behavior in the above program, using an union to type-pun between structs S and T. Unfortunately, this program returns 1 when compiled with clang -m32; it returns 2 when compiled with clang -m32 -O2, and it returns 0 when compiled with gcc -m32.

For a program as simple as the above, all these compilers are supposed to implement the same implementation-defined choices. Which compiler, if we may ask such a rhetorical question, is right? They all are, because the program is undefined. When function copy() is called from main(), the assignment *p = *q; breaks C99's 6.5.16.1:3 rule. This rule states that in an assignment from lvalue to lvalue, the left and right lvalues must overlap either exactly or not at all.

The program breaking this rule means compilers neither have to emit warnings (none of the above did) nor produce code that does what the programmer intended, whatever that was. Launched on the above program, the value analysis says:

... Partially overlapping lvalue assignment "*p=*q;".
Left address in bits: {{ u -> {32} }}.
Right address in bits: {{ u -> {0} }}.
assert(separated or same)

Acknowledgments and moral of the story

I first heard about the 6.5.16.1:3 rule from Chucky Ellison, and he had himself heard about it from Derek M. Jones. Later, the Csmith random program generator generated a program that computed differently when compiled with Clang and when interpreted by the value analysis (and indeed, that computed differently with different Clang optimization levels), all because of this rule. This is technically a bug in (that development version of) Csmith, which is supposed to generate defined C programs. But it successfully turned my hate of false positives against my laziness, and forced me to implement detection for programs that break this rule.

Monday, August 29 2011

CompCert gets a safe interpreter mode

Safe C interpreters galore

The last release of CompCert includes a safe C interpreter based on the compiler's reference semantics. Like KCC and Frama-C's value analysis, it detects a large class of undefined behaviors and some unspecified ones. Like them, in order to remain useful, it needs to make some difficult choices regarding what it accepts and what it rejects. C programs are seldom written entirely in the "strictly conforming" subset of the language, even when their authors advertise them as "portable". I once found on the Internet a "portable memcpy()" that started by decrementing one of its pointer arguments (a definite undefined behavior if the pointer points to the beginning of a block). This is one case that CompCert's interpreter and Frama-C's value analysis both accept. To warn, they both wait until you either dereference or compare the invalid pointer. I can't speak for Xavier, but the reason why the value analysis waits is that too many programs do this, and too many programmers do not even accept that it is wrong when you point it out to them (to be clear, I am not talking about the industrial users interested in Frama-C for verification of critical C programs here; that would be cause for worry. I am talking about the random C programmer you may find anywhere else). It would be very easy to modify the value analysis to warn as soon as the invalid pointer is computed if someone desired strict enforcement of the standard.


While Xavier was implementing this C interpreter, I was coincidentally working on an interpreter mode for the value analysis. This mode simply deactivates a number of features that are not useful for the step-by-step interpretation of deterministic C programs: recording memory states and detecting infinite loops. The option does not try to change the representation of memory states, that are still sets of possible valuations for all the variables in memory — in interpreter mode, the states just happen all to be singletons, and singletons that the interpreter represents slightly inefficiently. In fact, after it was done, I realized that this option would double up as a useful tool for some of the brute-force numerical precision analyses that are possible with the value analysis, so that right now, I am struggling to find a name for the option. Since a benchmark was coming up, I also de-activated all consistency checks inside the value analysis. Lastly, I linked Frama-C with a miracle library that I need to tell you more about in another blog post.

Benchmark

And there it was: the benchmark. This is an informal little thing, strictly for fun. There is no point in doing more serious measurements yet since Xavier may still improve the representation of memory states in CompCert's interpreter. I used the three files below:

  • SHA1 compression of a couple of small buffers, with a final check of the result: sha1.c
  • a program generated by the Csmith random C program generator. The program is in the intersection of what both interpreters support: csmith.c
  • the program main(){ signed char c; for(c=1; c; c++); return c;}

For reference, the commands used are (the Frama-C commands will not work with Carbon; you'll need to remove the unknown options; option -obviously-terminates can be approximated in Carbon with -slevel 999999999 -no-results):

$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates \
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each" sha1.c
...
[value] Called Frama_C_show_each({{ &"Test `%s': %s\n" }},{{ &test_input_1 }},
                                 {{ &"passed" }})
...
[value] Called Frama_C_show_each({{ &"Test `%s': %s\n" }},{{ &test_input_2 }},
                                 {{ &"passed" }})
...
$ time ccomp -interp sha1.c 
...
Test `abc': passed
Test `abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq': passed
...
$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates -big-ints-hex 0 \
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each -Iruntime" csmith.c
...
[value] Called Frama_C_show_each({{ &"checksum = %X\n" }},{0x3F65930A})
...
$ time ccomp -interp -Iruntime csmith.c
checksum = 3F65930A
$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates c.c
$ time ccomp -interp c.c

Results

The times shown are the medians of three runs.

For the small program with the signed char, ccomp -interp takes 0.016s and the value analysis 0.096s. I am tempted to attribute the value analysis' bad score to startup time: Frama-C is probably heavier than a compiler by now, and it loads all its available plug-ins at start-up. I could have improved this by disabling plug-ins that are not necessary for interpreting C programs this way.


For the SHA1 mini-benchmark, the value analysis takes 0.388s and ccomp -interp takes 0.296s.


For the Csmith random program:

Value analysis: 0.848s

CompCert interpreter: 0.616s

Conclusion

I think this is not bad at all for either of the two interpreters. I have examples where the value analysis does better than CompCert, but it wouldn't be fair to include them at this point, because they poke at a known weakness in CompCert's interpreter. On the other hand, I do not think that the value analysis will ever do much better than the results above. If I stumble on improvements that look like they'll improve both the analyzer and the interpreter in the value analysis, I'll implement these immediately, but I do not think I should compromise the analyzer's performance to improve the interpreter. It is slow enough as it is. Just ask Boris.

Wednesday, August 10 2011

Csmith testing reveals that I'm no good at probabilities (and lazy)

Csmith testing

A typical Frama-C Csmith testing script repeats four actions in an infinite loop:

  1. getting a random program from Csmith;
  2. compiling and executing it;
  3. analyzing it with Frama-C;
  4. using the results from step 3, possibly together with those of step 2, to determine whether the program reveals something that would interest a Frama-C developer.

Step four is the difficult part. I definitely want to hear about varied events that could happen during the analysis, and step four represents about half the testing script's line count. One difficulty is that if a Frama-C bug has already been identified, but isn't fixed yet, I do not want to be warned again about that bug until it is supposed to have been fixed. Sometimes it is possible to use Csmith command-line options to generate only programs that do not trigger the bug. Sometimes, this is impossible, and you have to recognize the bug in step 4.


A particular known bug in Frama-C was characterized by the backtrace below, that I am posting in its entirety to give you a chance to identify my mistake before I tell you about it.

         Raised at file "src/memory_state/new_offsetmap.ml", line 361, characters 26-37
         Called from file "src/memory_state/new_offsetmap.ml", line 377, characters 17-64
         Called from file "src/memory_state/new_offsetmap.ml", line 1394, characters 15-34
         Called from file "src/memory_state/new_offsetmap.ml", line 1436, characters 18-114
         Called from file "set.ml", line 293, characters 37-58
         Called from file "src/memory_state/new_offsetmap.ml", line 1433, characters 11-399
         Called from file "src/memory_state/offsetmap.ml", line 1308, characters 18-93
         Called from file "src/memory_state/lmap.ml", line 641, characters 24-260
         Called from file "src/memory_state/lmap.ml", line 626, characters 18-1023
         Called from file "src/memory_state/cvalue.ml", line 1171, characters 12-50
         Called from file "src/value/eval_exprs.ml", line 1399, characters 4-61
         Called from file "src/value/eval_exprs.ml", line 1447, characters 4-75
         Called from file "src/value/eval_exprs.ml", line 1058, characters 8-66
         Called from file "src/value/eval_exprs.ml", line 986, characters 10-62
         Called from file "src/value/eval_exprs.ml", line 1056, characters 8-52
         Called from file "src/value/eval_exprs.ml", line 1207, characters 4-55
         Called from file "src/value/eval_stmts.ml", line 446, characters 6-105
         Called from file "src/value/eval_stmts.ml", line 880, characters 54-69
         Called from file "list.ml", line 74, characters 24-34
         Called from file "src/memory_state/state_set.ml", line 26, characters 2-24
         Called from file "src/value/eval_stmts.ml", line 889, characters 14-232
         Called from file "cil/src/ext/dataflow.ml", line 320, characters 29-47
         Called from file "cil/src/ext/dataflow.ml", line 497, characters 8-21
         Called from file "cil/src/ext/dataflow.ml", line 501, characters 9-22
         Called from file "src/value/eval_funs.ml", line 100, characters 14-37
         Called from file "src/value/eval_funs.ml", line 502, characters 8-63
         Called from file "src/value/eval_stmts.ml", line 819, characters 12-71
         Called from file "src/value/eval_stmts.ml", line 831, characters 10-114
         Called from file "src/value/eval_stmts.ml", line 864, characters 38-62
         Called from file "list.ml", line 74, characters 24-34
         Called from file "src/value/eval_stmts.ml", line 920, characters 26-79
         Called from file "cil/src/ext/dataflow.ml", line 320, characters 29-47
         Called from file "cil/src/ext/dataflow.ml", line 497, characters 8-21
         Called from file "cil/src/ext/dataflow.ml", line 501, characters 9-22
         Called from file "src/value/eval_funs.ml", line 100, characters 14-37
         Called from file "src/value/eval_funs.ml", line 502, characters 8-63
         Called from file "src/value/eval_stmts.ml", line 819, characters 12-71
         Called from file "src/value/eval_stmts.ml", line 831, characters 10-114
         Called from file "src/value/eval_stmts.ml", line 864, characters 38-62
         Called from file "list.ml", line 74, characters 24-34
         Called from file "src/value/eval_stmts.ml", line 920, characters 26-79
         Called from file "cil/src/ext/dataflow.ml", line 320, characters 29-47
         Called from file "cil/src/ext/dataflow.ml", line 497, characters 8-21
         Called from file "cil/src/ext/dataflow.ml", line 501, characters 9-22
         Called from file "src/value/eval_funs.ml", line 100, characters 14-37
         Called from file "src/value/eval_funs.ml", line 482, characters 4-67
         Called from file "src/value/eval_funs.ml", line 564, characters 11-44
         Re-raised at file "src/value/eval_funs.ml", line 578, characters 47-50
         Called from file "src/project/state_builder.ml", line 1076, characters 9-13
         Re-raised at file "src/project/state_builder.ml", line 1080, characters 15-18
         Called from file "src/value/register.ml", line 49, characters 4-24
         Called from file "queue.ml", line 134, characters 6-20
         Called from file "src/kernel/boot.ml", line 36, characters 4-20
         Called from file "src/kernel/cmdline.ml", line 723, characters 2-9
         Called from file "src/kernel/cmdline.ml", line 200, characters 4-8
         
         Unexpected error (New_offsetmap.Make(V).End_reached).

The uncaught exception comes from new_offsetmap:361, in an auxiliary function, but in this particular case, the place whence this auxiliary function is called (line 377 in the same file) is a good way to identify the bug. If the auxiliary function raises the same exception when the function is called from another place, it's another bug. If it's from the same call, it's the same bug.

I adapted my shell script to include the following test in step 4:

if grep "line 377" analysis_log
then true # I do not know how to negate a condition in shell, and I do not want to learn
else ... # continue classifying the analysis

I wanted to grep for file name and line number, but in the backtrace, the file name is surrounded by quotes, and I didn't know how to escape the quotes inside the grep command-line.

Mergeable Interval Maps

The file new_offsetmap.ml contains the implementation of the data structure described in this article. Pages 6 to 33 are in English. Incidentally, Dotclear does not seem to offer any way to specify that a linked document is half in one language and half in another.

Although the new data structure's implementation is far from complete, it is useful, for handling unions, to occasionally translate from the old datastructure to the new one. Csmith generates programs where unions can be written through a member and read from another (the members are only of integral types). The old data structure can handle most sequences of write combinations having occurred before an union is read, but a few complex combinations are not handled. Richard Bonichon and I already wrote the code to handle all possible combinations in the new data structure, and it would be a waste of time to backport this work to the old data structure at this point.


You may have guessed that an exception is raised in this module because it's newer and full of bugs, but in fact, what is currently implemented already works quite well (one of the advantages of having lots of tests and a trusted reference implementation). The reason the new implementation detects bugs is that knowing in advance what it was going to be used for, we included lots of assertion checks for conditions we knew weren't supposed to happen. The bug comes in much earlier, during parsing, and cannot conveniently be worked around when it is detected in this module.

My mistake

What mistake did I make? The backtrace I have shown is 55 lines long. Each of them contains the pattern "xxx lines", where the values for "xxx" are not independent, but we'll assume that they are for the sake of this argument. All files are more than 377 lines long, and some of them are more than 3779 lines long. It is not sufficiently improbable that my script would hide from me another bug, with a completely different backtrace, that just happens to contain "line 377", or even "line 3775".


I noticed this a few days ago. Resolute in my decision not to learn how to escape quotes in shell, I decided to make false negatives much less likely by grepping for "line 377, characters 17-64" instead.

This is when I updated Frama-C from source control. The file new_offsetmap.ml being work in progress, the file had been modified and the line that characterized the bug changed. So I gave up, and started to experiment with grepping special characters. It turns out that it's possible to grep for doubles quotes by using the simple syntax below.

grep 'Raised at file "src/memory_state/new_offsetmap.ml"' analysis_log

What is the moral of this story? What I refused to learn was very simple; I eventually replaced an approximate test by another that is just as approximate, but in a different way; if you see other angles, please write them in the comments.

Aside

Are you thinking "55 frames in the call stack? What a bunch of bloated analysis framework! This thing must be completely useless!"? Then you are making judgments based on criteria you are familiar with and ignoring those that are new to you. A more correct reaction would be "I know how to implement a much lighter analyzer that will have all the same functionality and will be just as precise in 5000 lines of my favorite language over a week-end; I'll do that and show them."


Zaynah Dargaye provided useful comments on an early version of this post.

Saturday, July 30 2011

We have a Csmith-proof framework

Csmith, that I mentioned earlier in this blog, is a random generator of C programs. That much sounds easy, but it generates only well-defined programs which two or more compilers have no excuse for compiling into executables that produce different results. And it generates varied and interesting enough C programs that it is possible to find lots of compiler bugs this way. These two constraints are obviously difficult to reconcile, and the people behind Csmith have done very impressive work.


Frama-C is a collection of analyses and transformations for C programs. In analyzing and transforming, it gives meaning to these programs. There are various ways of checking that the meaning given is consistent with the meaning implemented in a reference C compiler, so that Frama-C bugs can be found with this powerful tool, too.

Anne Pacalet, Benjamin Monate, Virgile Prevosto, Boris Yakobowski and I have been fixing roughly 50 bugs in the framework's front-end, pretty-printer, and in the value analysis, constant propagation and slicing plug-ins during the last 6 months. Fifty may seem like a large number, but you have to consider the different functions the bugs were spread over. I am sure that Csmith found as many bugs in at least one open-source and widely respected compiler. As of today, the web page claims "more than 350 previously-unknown compiler bugs", and the number is probably outdated, since it increases weekly. The number does not include our own 50 as far as I know.

More importantly, all the Frama-C plug-ins for which I found a way to take advantage of Csmith seem to be completely immune now, excepting a couple of minor limitations that can be circumvented for additional Csmith testing until they are definitely fixed. The difficult design constraint for a Csmith-based Frama-C testing script is that it must produce no false positives: the script must be able to run day and night, on tens of processors, and only require human review when it has definitely found a bug in Frama-C (it is also acceptable to find bugs in the reference C compiler or in Csmith itself, but if the script taps my shoulder for me to look at something, there had better be a bug somewhere).

That does not mean that Frama-C's value analysis or slicing plug-ins have no bug left. There are many possible bugs that could not be found this way because of the limitations of the testing methodology (especially the "no false positive" constraint), and indeed, in these plug-ins, we were fixing bugs not found by Csmith during the same period. Still, I think it is a noteworthy milestone.


Lastly, Csmith-imperviousness is a moving target. There was a time when the front-end and value analysis plug-in had already reached Csmith-proofitude (it is better to do the layers from bottom to top, so as to avoid cascading bug reports where, for instance, the slicing is wrong because the value analysis feeds it bad values because of a typing bug in the front-end). Two weeks later, Xuejun Yang, the main implementor of Csmith, had improved it so that it generated new constructs and found new problems in these previously Csmith-robustified parts. I hear Xuejun is now writing his PhD. Hopefully this will allow us a few months to boast.

Tuesday, May 31 2011

The Stupidest Question I have ever heard at a Computer Science presentation

Some time back, I reported a Csmith bug against the last git version. This is in no way a rant against Csmith, a fine piece of engineering. Csmith does something that I would not have thought possible if you had asked me before I saw it: it generates random C programs that are both free of undefined behavior and varied enough to uncover a ton of bugs in the C-accepting program you pipe it into.

Unless the C-accepting program is CompCert, in which case you only uncover a couple of bugs, apparently.

Bugs in random programs have a funny way to find themselves. You don't need to try many inputs, as the random program explores these for you. Even if you think you are being quite the conservative user, they pop up uninvited from time to time (although, to be fair, you can see that they are being fixed because they pop up less and less frequently). Anyway, dangling pointers were being passed as arguments to functions that didn't use them, although I was not using the option that allows this behavior in generated programs. This was my bug. It had been good company, but I didn't feel ready for a long-term relationship. It was time to let go.


Fast forward a few days, and there is this promising commit on github. I decide I may as well upgrade, although the bug was only appearing from time to time because I had fiddled with the parameters in a way that made it more likely to appear, and I had not seen it at all since I had fiddled with the parameters again.


Suddenly, as I re-compile, with GCC, this program generator that finds bugs in GCC, I am reminded of the stupidest question I ever heard at the end of a computer science presentation.

Like many bad questions, this question was more a remark than a question. Adapted to the present context, it would go like this:

It doesn't make sense to use Csmith find bugs in C compilers, since it is itself compiled with a C (technically, C++) compiler.

Apparently, as a physics student, much earlier, the asker had been chided for using a pendulum-based chronometer to verify the constancy of a pendulum's period.

So, if I may be so bold as to impart my own wisdom to the world in the form of random remarks, there are two lessons in this post. "Be careful of circular reasoning" is one, but "software is not quite as similar to the physical world as you could expect" is another.