Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Wednesday, December 12 2012

Seriously, Go?

I used to be curious about the D programming language. D had been pitched to me as “C done right”. Even before I had time to look at it, though, someone on StackOverflow was having an issue that stemmed from constant floating-point expressions being evaluated at compile-time with different semantics than the run-time semantics.

The C language has this problem too. But I do not see the point of switching to a similar, supposedly improved language if it does not at least fix this sort of issue.


History, it turns out, is cyclical modulo alpha-renaming. Today in another question on StackOverflow, someone was having a hard time writing the IEEE 754 constant -0.0 as a literal in the new programming language Go.

Go was not pitched to me as a better C, but it is pitched as a good replacement for some of C's applications. I had been mildly curious since I heard about it. Call me easily disappointed, but here is what I learned from today's question:

  1. In the current Go implementation, if you write -0.0, you get +0.0.
  2. If you write var a = 0.0 * -1.0, variable a is still initialized to +0.0.
  3. But you can obtain -0.0 if you write var a = 0.0 followed by a *= -1.0.


So Go has egregious differences between the run-time and compile-time semantics used for floating-point computations, but on the plus side, it sucks at recognizing static floating-point computations.

Saturday, December 1 2012

Syntax appropriateness

I know! Let us make [ and ] function as meta-characters when in code style. Users will surely love the ability to insert hyperlinks inside the code they are writing a blog post about.

—The authors of the Content Management System this blog relies on


In the previous post, (float){…}, discussed as if it were the syntax for anonymous arrays, should have read (float []){…}.

Monday, November 26 2012

Debugging

A remark I have not heard about the last two posts is the following.

Pascal, how can you, in your last post, claim that formal verification would have found the bug in your decimal-to-floating-point function? This is the kind of outrageous claim you rant against in your penultimate post! Formal verification did not find the bug. A code review did.

My answer is twofold. First, subscribers get the blog they deserve. If you cannot rail and mock such obvious contradictions, what is to prevent me from putting them in? Also, I was not advertising a particular OCaml formal verification tool. I do not know any that accepts the full language. It was fully confident in the absence of any such tool that I claimed that if it existed, and was usable and automatic enough to have actually been used, it could have found the bug disclaimed in the last post. If provided with an appropriate specification. Which, in the case at hand, could only be a decimal-to-floating-point implementation in itself.


The truth is that Frama-C development involves a lot of old-fashioned testing and debugging. One factor is that available OCaml programmer-assistance tools do not always scale to projects the size of Frama-C. Another is that Frama-C uses some hard-to-get-right, hard-to-diagnose techniques (for instance, mixing data unmarshaling and hash-consing).


Speaking of which, I think my Frama-C debugging colleagues will agree with me that this introduction to debugging is a nice read.

Sunday, November 18 2012

About the rounding error in these Patriot missiles

An old rant: misusing Ariane 5 in motivation slides

I was lucky to be an intern and then a PhD student at INRIA, while it was still called “INRIA” (it is now called “Inria”). This was about when researchers at INRIA and elsewhere were taken to task to understand the unfortunate software failure of the Ariane 5 maiden flight. So I heard the story from people I respect and who knew at least a little bit about the subject.

Ever since, it has been irking me when this example is taken for the purpose of motivating some formal technique or other. Unless you attend this sort of CS conference, you might not believe the non-solutions that get motivated as getting us all closer to a world in which Ariane 5 rockets do not explode.


What irks me is this: even if the technique being motivated were comparably expensive to traditional testing, requiring comparable or a little less time for comparable or a little more confidence, that would not guarantee it would have been used to validate the component. The problem with Ariane 5 was not a failure of traditional tests. The problem was that, because of constraints of time and money, traditional tests were not applied to the component that eventually failed.

If your method is not cheaper and faster than not doing the tests, do not imply that it would have saved Ariane 5. It might have been omitted too.


The report is online. Key quote: “no test was performed to verify that the SRI would behave correctly when being subjected to the count-down and flight time sequence and the trajectory of Ariane 5.”

A new rant: the Patriot missile rounding error

If you attend that sort of conference, you have also heard about this other spectacular software failure, the Patriot missile rounding error. This bug too is used to justify new techniques.

This one did not irk me until today. I did not happen to be in the right place to get the inside scoop by osmosis. Or at the wrong time.

I vaguely knew that it had something to do with a clock inside the Patriot missile measuring tenths of seconds and the constant 0.1 not being representable in binary. When researchers tell the story on a motivation slide, it sounds like a stupid mistake.

There is nothing reprehensible in calculating with constants that are not represented finitely in binary. Other computations inside the missile surely involved the number π, which is not finitely representable in binary either. The designer of the system simply must understand what ey is doing. It can get a little tricky, especially when the software is evolved. Let me tell you about the single-precision cosf() function from the Glibc library in another rant.


Similarly with the Ariane 5 case, a rather good-looking summary of the issue is available. Assuming that this summary is correct, and it certainly looks more plausible than the rash explanations you get at motivation-slide time, the next time I hear a researcher use the Patriot missile example to motivate eir talk, I will ask the questions that follow.

  1. When are you adapting your system to ad-hoc, 24-bit fixed-point computations (not whether it is theoretically possible but when you are doing it)?
  2. When are you adapting your system to ad-hoc, non-IEEE 754, 48-bit floating-point computations?
  3. Will your system then detect the drift between the same computation having gone one path and the other?


If your system is only practical in an alternate universe in which the Patriot missile software is cleanly implemented with good old IEEE 754 double-precision values: sorry, but in that universe, the Patriot missile does not exhibit the problem you are claiming to solve.


Thanks to Martin Jambon for proof-reading this post.

Tuesday, October 16 2012

Exact Gap Computation for Code Coverage Metrics in ISO-C

Comparing static analysis tools is (still) difficult

Earlier this year of 2012, some of my colleagues and I took the opportunity to point out that, as a research community, we are not doing a satisfactory job of comparing static analysis tools. This article and blog post were concerned with independent third-party comparisons. Another context in which each of us can be lead to do comparisons, and to try to be as objective as possible, is the “related work” section of any traditional, non-benchmark article.

“Related work” sections

Dirk Richter and Christian Berg have designed a technique that, apparently, it makes some sense to compare to what Frama-C's value analysis allows. Here is their summary of their comparison between their tool and the value analysis:

No sophisticated examples can be handled by Frama-C’s value analysis. Some of the examples tested even cause runtime-errors in Frama-C itself, thus it is not reliable.

Richter and Berg's technique appears to tackle the problem of estimating how much code must actually be covered for all the code that can be covered to have been covered.

Frankly, it seems to me that the researchers should be comparing their tool to existing test-generation tools. They are much closer in objectives, and in the means deployed to reach these objectives. One such tool that I know of because it is developped locally is PathCrawler. I am sure there are plenty of others. I am not sure why they spend a paragraph on Frama-C's value analysis when it is absolutely not documented as trying to do this.

Possible comparisons

However, I am interested in tool comparisons, so this is a good opportunity to understand how these comparisons work and how we can make them better.


First, the claim “No sophisticated examples can be handled by Frama-C’s value analysis” is interesting, because of course what really kills advanced techniques in practice is that they do not scale to even moderate size programs. The authors point out that their own work may only be “practical” for embedded systems. Coincidentally, Frama-C's value analysis was designed to work well on embedded code, too, since we too felt this was a domain where practicality was within reach. This makes me curious as to what embedded code examples the authors experimented on before reaching their conclusion. If these examples reveal weaknesses in Frama-C's value analysis on embedded code, we would definitely like to hear about them. I would summarize this into one additional item that we did not list in our “Benchmarking static analyzers” article (although it was implict in other recommendations we did list): when doing comparisons, conditions should be made explicit and input programs should be made available. Here, this would mean providing the Frama-C version and commandline in addition to the test programs themselves. For such programs and commandline that “cause runtime-errors in Frama-C itself”, a good place to provide them is our Bug Tracking System. Instructions for reporting a bug are displayed on crashes. Otherwise, a tarball with all the embedded C code examples the authors used would be fine. A URL could be provided in the article.


Second, the authors claim that Frama-C is not reliable. This makes me want to investigate because, precisely, in 2012 we published a peer-reviewed article on the subject “Frama-C is reliable, here is how we used random program generation to make it so”. I am eager to get my paws on the authors' implementation so as to evaluate its reliability, since this is supposed to be one aspect where it does better. Unfortunately, the implementation does not appear to be available yet.


Third, it is always a difficult exercise to compare one's own tool to a different, unfamiliar one. One risks ending up with a slightly unfair conclusion, even if one tries one's hardest to be objective. I was remarking about the difficulty of finding “fair” inputs for the comparison of static analyzers in another previous post. The difficulty for static analyzers is that inputs are programs: programming languages are so expressive that two programs, even in the same language, can differ widely.

There, as a remedy, I suggested tool authors first try their own tool on the programs that the other analyzer works well on. For someone working on another technique and unfamiliar with the strengths and weaknesses of Frama-C's value analysis, Csmith-generated examples are one obvious possibility. Another possibility would be to use the programs in the 2012 RERS challenge. These programs contains assert(0); calls and, for each, an objective is to tell whether it is reachable. This seems to me to be the same question as whether a test suite needs to cover the assert(0); call. The good news is that both on Csmith-generated programs and in the RERS challenge, Frama-C's value analysis can be made to be both sound and complete, just like Richter and Berg claim their technique is. It should be simple and informative to compare the results of their tool to the results obtained with the value analysis then. Below are our solutions for problems 1-9.

Reachable assertions in problems 1-9 of the RERS competition

Below are a list of reachable lines for each problem. The assert(0) calls not listed are unreachable.

Problem1.c

error_20: assert(0);
error_47: assert(0);
error_32: assert(0);
error_37: assert(0);
error_56: assert(0);
error_33: assert(0);
error_57: assert(0);
error_50: assert(0);
error_35: assert(0);
error_15: assert(0);
error_38: assert(0);
error_21: assert(0);
error_44: assert(0);
globalError: assert(0);

Problem2.c

error_50: assert(0);
error_45: assert(0);
error_59: assert(0);
globalError: assert(0);
error_43: assert(0);
error_13: assert(0);
error_16: assert(0);
error_44: assert(0);

Problem3.c

error_45: assert(0);
error_35: assert(0);
error_52: assert(0);
error_39: assert(0);
error_9: assert(0);
error_37: assert(0);
error_43: assert(0);
error_31: assert(0);
error_28: assert(0);
error_27: assert(0);
error_50: assert(0);
error_13: assert(0);
error_26: assert(0);
globalError: assert(0);

Problem4.c

error_12: assert(0);
error_19: assert(0);
error_31: assert(0);
error_39: assert(0);
error_52: assert(0);
error_6: assert(0);
error_58: assert(0);
error_40: assert(0);
error_4: assert(0);
error_38: assert(0);
error_45: assert(0);
error_11: assert(0);
error_26: assert(0);
globalError: assert(0);
error_9: assert(0);
error_17: assert(0);
error_32: assert(0);
error_35: assert(0);
error_55: assert(0);
error_36: assert(0);
error_14: assert(0);
error_18: assert(0);
error_13: assert(0);
error_15: assert(0);
error_27: assert(0);

Problem5.c

error_0: assert(0);
error_38: assert(0);
error_57: assert(0);
error_55: assert(0);
error_58: assert(0);
error_32: assert(0);
error_13: assert(0);
error_51: assert(0);
error_33: assert(0);
error_48: assert(0);
error_18: assert(0);
error_39: assert(0);
error_1: assert(0);
error_41: assert(0);
error_37: assert(0);
globalError: assert(0);
error_11: assert(0);
error_26: assert(0);
error_15: assert(0);
error_40: assert(0);
error_36: assert(0);
error_44: assert(0);
error_30: assert(0);
error_47: assert(0);
error_24: assert(0);

Problem6.c

error_12: assert(0);
error_21: assert(0);
error_11: assert(0);
error_44: assert(0);
error_1: assert(0);
error_36: assert(0);
error_0: assert(0);
error_2: assert(0);
error_38: assert(0);
error_48: assert(0);
error_37: assert(0);
error_4: assert(0);
error_59: assert(0);
error_10: assert(0);
error_20: assert(0);
error_5: assert(0);
error_15: assert(0);
error_27: assert(0);
error_33: assert(0);
error_9: assert(0);
error_29: assert(0);
error_47: assert(0);
error_56: assert(0);
error_24: assert(0);
error_58: assert(0);
globalError: assert(0);

Problem7.c

error_58: assert(0);
error_47: assert(0);
error_5: assert(0);
error_48: assert(0);
error_19: assert(0);
error_39: assert(0);
error_36: assert(0);
error_40: assert(0);
error_35: assert(0);
error_31: assert(0);
error_9: assert(0);
error_42: assert(0);
error_7: assert(0);
globalError: assert(0);
error_11: assert(0);
error_20: assert(0);
error_44: assert(0);
error_46: assert(0);
error_18: assert(0);
error_6: assert(0);
error_23: assert(0);
error_30: assert(0);
error_3: assert(0);
error_37: assert(0);
error_15: assert(0);

Problem8.c

error_48: assert(0);
error_2: assert(0);
error_49: assert(0);
error_15: assert(0);
error_7: assert(0);
error_55: assert(0);
error_51: assert(0);
error_50: assert(0);
error_43: assert(0);
error_10: assert(0);
error_29: assert(0);
error_24: assert(0);
error_1: assert(0);
error_26: assert(0);
error_6: assert(0);
error_5: assert(0);
error_46: assert(0);
error_13: assert(0);
error_4: assert(0);
error_37: assert(0);
globalError: assert(0);
error_34: assert(0);
error_25: assert(0);
error_28: assert(0);
error_59: assert(0);

Problem9.c:

error_46: assert(0);
error_57: assert(0);
error_36: assert(0);
error_19: assert(0);
error_6: assert(0);
error_10: assert(0);
error_34: assert(0);
error_15: assert(0);
error_32: assert(0);
error_41: assert(0);
error_11: assert(0);
error_35: assert(0);
error_2: assert(0);
error_20: assert(0);
error_3: assert(0);
globalError: assert(0);
error_44: assert(0);
error_38: assert(0);
error_51: assert(0);
error_54: assert(0);
error_56: assert(0);
error_53: assert(0);
error_47: assert(0);
error_59: assert(0);
error_8: assert(0);

Tuesday, July 17 2012

Code smells smell

This blog post is on the subject of “code smells” and what is wrong with them. I initially wanted to go with the title “Code smells are crap”. That made it clearer that the title was grammatically a sentence, with “code smells” a nominal group followed by a verb. But there seems to be something in this blog's content management system that censors four-letter words in the title of articles, like some sort of administrator or something[1] friendly moderator[2].


At the time of this writing, Wikipedia defines a code smell as “any symptom in the source code of a program that possibly indicates a deeper problem”. I do not challenge this definition. Indeed, to the best of my knowledge this is exactly the meaning many researchers (and engineers) working in software engineering intend when they talk of “code smells”. I think, however, that there is something fundamentally wrong, not with the definition, but with scientific use of any notion accurately described by this definition. The issue I see, in so many words, is that it is not falsifiable.

Say that you build a tool that detects “code smells”. You might even call your tool a “static analyzer”, and there might be so many of you and of your tools that “static analysis” is understood by the majority as the craft of building code smell detectors. The tool flags as smelly a diligent implementation of a subtle but efficient computation taken from The Art of Computer Programming. This deserves the name “false positive”, right? Surely your tool does not pretend to judge Knuth's carefully crafted, state-of-the-art algorithms? Ah, but no, it is not a false positive, because the notion is built right in the definition that a code smell only “possibly indicates a deeper problem”. Whatever the tool found is a code smell by virtue of having been flagged by the tool. It was not a problem, but it was a code smell because it could have indicated a problem.


As scientists, what the four-letter word are we supposed to do with such a notion?

Notes

[1] This post was crying so loud for moderation that it would have been a shame not to grant it

[2] a special kind of facetious colleague

Thursday, June 28 2012

Undefined behavior is a harsh mistress

Mark Shroyer has discovered an interesting consequence of undefined behavior in the compiled version of a C program: a variable behaves as both true and false, as a result of being uninitialized.

The post is great, and could not come at a better time. I needed, for the talk I will give on Saturday morning at COMPARE2012, convincing examples that some undefined behaviors cannot be recovered from. For a long time, I had wanted an example that “uninitialized” was not equivalent to “holding an unknown value”, and this is just it. Once again, procrastination wins!

One a side note, should I be worried that I am building tolerance to the panic of not having one's slides ready for an upcoming talk? If I can't even get in a frenzy in the last few days before the talk, I may just become useless. Perhaps it's time for retirement.


The only disappointing aspect of Mark's blog post is the general reaction to it, in the blog's comments or elsewhere. Guys, Mark obviously knows what undefined behavior is! He is not blaming GCC for not giving a consistent value to his variable p. Using uninitialized memory is a bug in his program; it's a bug he has been looking for, and the consequences of it were funny, so he shared. No need to get all professorial about it.

The so-stupid-it's-funny prize goes to the comment below.

First, you demonstrate ignorance talking about C language with the “bool” data type.

Dude, Mark demonstrates his mastery of C by properly using stdbool.h as part of a sound discussion of undefined behavior in a C program.


If you are in the Manchester area and are reading this, please drop by on Saturday. Gate-crash the conference if you have to. You can always blame my bad influence if you get caught.

Tuesday, June 12 2012

IOCCC

So I am at this conference, right?


Having been looking, for a few months now, without luck, for someone who teaches C programming*, I figure this conference is as good an occasion to find em as any. But it sounds kind of stupid to go to someone and just ask "excuse me, but would you happen to teach C?"


Having reduced a problem I have been stuck with for a while to the easier problem of not looking stupid, I find that if I augment my question a little, it may just become weird enough as to look cute. And I can increase my chances at the same time. And learn something on the side. So I prepare a mini-survey that I feel scientific-minded people might receive positively:


  1. Do you happen to teach C?
  2. If you do not, do you know someone at this conference who might or who might recursively know someone who might?


You will have recognized a “small world”-type experiment, one that could reveal something interesting about the connectedness of the graph whose nodes are the attendees of this particular conference.

Just as I feel ready to start, two gentlemen in front of me arrive to a pause in their discussion. I seize the opportunity to introduce myself and ask them both my carefully prepared first question. And it turns out one of them is a teaching assistant for a C programming class. Disappointing, isn't it? Now I no longer have reason to continue my survey, and although I have learnt something about this conference's attendees, it will never be very statistically significant. I didn't even get to the “small world” part.


But wait, it does not end here! Just a little while later, as we are still discussing C in relation to teaching, a third person who knows us each interjects “oh, have you two guys found each other?” and, when it appears we did not even know we were looking for each other, does some proper introductions. It turns out that I am talking to a 2011 IOCCC laureate.

So at this conference I am at, one attendee in two has written a winning IOCCC entry. With low confidence.


(*) many “introduction to programming”-kind of classes have moved from C to Java, apparently, which would be a good thing if Java didn't have its own set of failures as a beginner's language. Anyway, there is a work-related experiment I want to conduct and it only works with a C course.

Tuesday, May 15 2012

On ending discussions and painting bikesheds

A high-profile maintainer to a large Open-Source project recently found himself involved in a discussion in which, at some point, after having explained his arguments—I think—very clearly, he had to call someone else a moron. Now everyone with write access to the internet is weighting in on how this is no way to behave and how the bike-shed really should be painted a more friendly color.


Most of the people I have seen opine do so with the confidence of someone who has created something as widely useful as Linux and knows how things are done. Dominik Dabrowski comments “You might have fun raging on the internet, but I think your goals would be better served if ...”. Kenneth Reitz writes a blog post with the title “Be Cordial or Be on Your Way”. If everyone who has expressed their opinion had created as much great software as Linus, this would be a better world. I might even not have to use Mac OS X.


Now, of course, the Github discussion is a little bit difficult to follow, because the comment Linus was replying to has been deleted. A little bit of investigation is needed to find out that he was probably replying to the sentence “I did not realizes that Linus' shit does not stink. Thanks for clearing that up...” (source).

“You're a moron” seems to me a perfectly fine retort to someone who thus butts into a public discussion in which you have already made your technical arguments clear. Plenty of alternatives are also acceptable, of course, including ending the thread. I do not see what the racket over Linus' answer is. But my point is that even if I thought his answer was unacceptable, I would first question whether my opinion should matter, as someone whose contribution is relatively small. Where is the evidence that doing things my way is better? Is there, for instance, anyone in the world with a comparable contribution (especially in terms of visibility) who isn't brutally frank in eir interactions with the thousands of persons ey has to deal with—some of whom, statistically, have to be pretty dim-witted? Perhaps this is the kind of personality one needs to have to build something useful in the first place. Perhaps one becomes more and more direct with each of the million e-mails I roughly estimate one needs to handle in the lifetime of such a project. Where, in other words, is the person that Linus Torvalds could use as role model?


To reiterate, I wish everyone who has been expressing their opinion on how to handle Open Source contributions had written the kind of highly visible, useful software that justifies having an opinion on contribution management. If they had written so much more, so much more useful software that someone like Linus could be on his way without being missed, that would be great.

Saturday, May 12 2012

Life stole my joke—and all I got is this lousy blog post

On a recent occasion to visit the USA on a professional basis, I nearly declined. I had a lengthy explanation ready for my refusal, and it might even have become a rant-tagged blog post. I finally went, so that hypothetical blog post never became. Today, I feel I have been cheated: a really good bit of my explanation would, if I were to re-use it now in another rant, appear unimaginative; stale, even.

See, my explanation of why I didn't want to be a visitor in the USA included a passage on how schools everywhere in the world, at all levels, have bullies, but only in the US these get elected president. (You had to see it in context. There were tie-ins with culture and international politics. It was a beautiful rant)

The reason this bit is suddenly worthless is of course the latest mini-scandal in US election debates. Here is the article through which I first heard of it. My rant was about another person, who may or may not have been US president at some point in eir political career.


It is not my place to comment on foreign presidentiables. The following comments are intended only in the abstract.

  • About the facts themselves, the fact that a brat bullied five homosexual fellow students does not in itself imply that the brat has a bias. If ey similarly bullied 95 heterosexual students, say, forcibly cutting the hair of 19 of them, and if 5% actually is the percentage of the population who recognizes itself as homosexual, ey cannot be blamed for being biased—although ey can still be blamed for being a bully.
  • am I the only one to find “I [didn't think] the fellow was homosexual” a lousy excuse? It does not matter whether one “thinks”. From the point of view of the victim, it shouldn't be necessary to broadcast one's sexual orientation to be morally protected from harassment. A student who takes offence at any visible lifestyle choice of eir colleagues that aren't any of eir business is guilty of intolerance. If it later turns out that a relatively large proportion of eir victims are from a same minority, that's just bad luck for em, because it makes the crime more obvious. But even if the victims did not happen to be from a recognizable minority, the crime would still be the same crime.
  • Finally, the only way I can imagine a kid doing the kind of “hijinks” described in the article and not being sent home for the remainder of the schoolyear is if ey is a very rich kid in a school/university that needs to worry where its next budget is going to come from. Since French Universities have begun to “gain autonomy” under our own previous president, are we now going to have the same system where rich kids get a free pass with discipline, even at the cost of the peace of mind of their fellow students? A few decades from now, will we have to elect them for the highest offices, too?


If told that this post is too off-topic for this blog, I plan to use the “but the link to the original article came from my colleague Eric Eide” defense. That, and something on the general topic of conference attendance and testicle massages.

- page 2 of 5 -