Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Friday, February 1 2013

ENSL seminar

Seminar

As anticipated, I was at my alma mater's student seminar last tuesday. School and seminar were very much like I remembered them. The latter was improved by orange juice and biscuits to chat around after the talk, that I do not think were part of the protocol when I was a student.

My years at the ENS of Lyon were a good time, but I had forgotten the reasons that made them so. If any of the students from the seminar read this, I hope they will pass on my thanks to everyone for reminding me why. In one word, it is the company.


The sort of discussion that emerges naturally in an open, low-stakes chat after a general presentation of Frama-C but is too open-ended to be part of the talk's questions is that of general usefulness.

Not every programmer is writing critical embedded code—in fact, most of us aren't. But in this day and age, many of us who program at all write code with mild to severe security implications, and I believe that with a reasonable amount of work, Frama-C can next help here.

Video games, and software security in the news

The seminar also gave me an excuse to visit my family that lives in the general area of Lyon. As a result, today, I was helping my nephew play Professor Fizzwizzle (now an online game, apparently) after lunch, when the 13:00 TV news announced that some webcams, sold for the purpose of allowing homeowners to watch their living room remotely and reassure themselves that they are not being burgled, instead allowed anyone with internet access to watch, at any time.

This was not by design, of course. This was the result of a security flaw. I am not saying that the security flaw was one of the kind Frama-C identifies, say, a buffer overflow in a C program: the France 2 news report (in French, available for a week) did not go into such detail. But I am willing to bet that software was involved.

The future

So what world is my three-year-old nephew going to grow in? Either we collectively solve the software security problem, and hopefully, when he is in age of lending an ear to TV news while playing Professor Fizzwizzle, there won't be any more of these spectacular failures to announce. Or we do not, and then, at the current rhythm, TV news won't announce security failures because they will not be news any more.

A reflexive illustration

One last thing: this blog post contains two links, one that I expect to be to a Java web applet (my nephew was playing a game that I bought a license for and downloaded the old-fashioned way in the early 2000s, but it always felt like Java, especially at launch-time). The other is to a Flash video. If either of them worked for you, you should seriously think about disabling plug-ins in your browser. Yes, you will be missing out, but frankly, this is the only solution at this time.


Students from ENSL or elsewhere, if you would like to help solve this problem (at least the part where embedded-like C code is concerned) and at the same time solve your problem of picking a first- or second-year internship, e-mail addresses of Frama-C developers are either firstname dot lastname at inria dot fr for contributors from Inria, or firstname dot lastname at cea dot fr for contributors from CEA (including me).

Thursday, January 24 2013

Customers, customers, customers

The recent posts on extremely minor undefined behaviors in zlib neatly tie in with a discussion on John Regehr's blog about the future-proofitude of C and C++.


Another insightful post in this regard is this one by Derek Jones. Derek claims that the situation is different for proprietary compilers with paying customers. The argument rings true to me. The only proprietary compiler I know that competes with GCC or Clang in terms of aggressive optimization (at the cost of breakage of existing code) is the Intel C++ compiler. But that is not a counter-example: I did not pay for icc, nor do I know anyone who pays for it in order to use it in production.

Wednesday, January 23 2013

2000s

Blogger of multiple qualities Harry McCracken was recently still looking for “an iPad PDF reader which can handle giant files which are 100s of pages long without choking”. Sorry, I meant “STILL looking”.

PDF is a nineteen-nineties technology to display text and pictures. Key quote: “Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it”.

Now, you might think that any proposal made in 1991 to master the intricacies of displaying text and pictures would have succeeded by 2013, but apparently not, according to Harry's call for help.


Fortunately, another nineteen-nineties technology came to the rescue: the Djvu format was created in 1996 for the purpose, I wish I was kidding, to store text and images. Key quote: “a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs”.


It is good to see progress at work. Even if we acknowledge that it takes years to get these things right and that a project started in 1996 is bound to be more technologically advanced than one started in 1991, there remains the question, what the file have we been collectively doing during the years 2000?


This leads me into a personal anecdote. In 1995, I was admitted at the ENS of Lyon (and I may be thinking about it because I am visiting back there on January 29. Drop by if you are around…). These were interesting times: 1995 is more or less the year the internet slash the world wide web started to become widespread in the US and then in other parts of the world. We were privileged to have access to it already at the ENSL.

In 1996, I left France for a 2-and-a-half months internship at Indiana University. Before I left, I taught my father to use the 28800 BPS external modem I was leaving him to dial long-distance to Lyon, impersonate me on the ENSL's server, upload a text file containing news of the family, download a text file that might contain news from me, and disconnect as soon as possible because heck, that was expensive (although cheaper than any available alternative). This was done with software that was called either “Kermit” or “ZMODEM”. Kermit, I am pretty sure it was. You could not trust these new-fangled protocols then any more than you can now.

Nowadays, my father has his own e-mail address and he may write to anyone he likes, as opposed to leaving messages in a dropbox for me. And he does not pay long-distance rates for it.

But I can't help thinking that all progress that has occurred since then was incremental. Slightly better for much cheaper. There has been the iPhone. But then again, the first PalmPilot was launched in 1997.


So, what have we been doing during the years 2000? What technologies were so life-changing that we are going to take them for granted in the 2010s?

Wednesday, January 16 2013

Bad zlib, bad! No compare pointer!

In a previous post, we remarked that the decompression function of zlib, for some inputs, computes an invalid pointer. But at least it neither dereferences it nor compares it to another pointer.

Or does it?

Recipe for an invalid pointer comparison

Instrument

Take an ordinary zlib library version 1.2.7, and instrument it according to the diff below.

$ diff -u zlib-1.2.7/inffast.c zlib-modified/inffast.c
--- zlib-1.2.7/inffast.c	2010-04-19 06:16:23.000000000 +0200
+++ zlib-modified/inffast.c	2013-01-16 23:37:55.000000000 +0100
@@ -64,6 +64,9 @@
       requires strm->avail_out >= 258 for each loop to avoid checking for
       output space.
  */
+
+int printf(const char *fmt, ...);
+
 void ZLIB_INTERNAL inflate_fast(strm, start)
 z_streamp strm;
 unsigned start;         /* inflate()'s starting value for strm->avail_out */
@@ -316,6 +319,7 @@
     strm->next_in = in + OFF;
     strm->next_out = out + OFF;
     strm->avail_in = (unsigned)(in < last ? 5 + (last - in) : 5 - (in - last));
+    printf("out=%p, end=%p\n", out, end);
     strm->avail_out = (unsigned)(out < end ?
                                  257 + (end - out) : 257 - (out - end));
     state->hold = hold;

This instrumentation causes two pointers, out and end, that are compared at that point of the library, to be printed. Apart from that, it does not change the behavior of the library. The library behaves the same when the printf() call is not there, albeit less observably.

Following the documentation, write an ordinary main()

Prepare a main() function that uncompresses a buffer deflated_buffer more or less according to the official tutorial:

main(){
  unsigned char out_buffer[CHUNK_OUT];

  printf("out_buffer: %p\n", out_buffer);

  /* allocate inflate state */
  ...

  ret = inflateInit(&my_strm);

  if (ret != Z_OK)
    return ret;

  my_strm.next_in = deflated_buffer;
  my_strm.avail_in = 40;
  my_strm.next_out = out_buffer;
  my_strm.avail_out = CHUNK_OUT;
	
  ret = inflate(&my_strm, Z_FINISH);

  ...
}

You can download the file zlib_UB.c.

Secret ingredient

Season with a specially crafted deflated buffer:

unsigned char deflated_buffer[40] = {
  120,
  156,
  67,
  84,
...

Test

Having built the instrumented zlib, you can compile and link the main() function:

$ gcc -I Downloads/zlib-modified/ zlib_UB.c Downloads/zlib-modified/libz.a
$ ./a.out 
out_buffer: 0x7fff5bd9fa34
out=0x7fff5bd9fa33, end=0x7fff5bd9fa5e

Although we followed the documentation when writing the main() function that calls it, zlib appears to be, at inffast.c:320, comparing a pointer out that points before out_buffer to a pointer end that points inside out_buffer. In fact, the pointer out has been computed as an offset of out_buffer, namely, out_buffer - 1. Zlib is not supposed to do this, for two reasons:

  1. The compiler could place out_buffer at address 0, or at the beginning of a segment in a segmented architecture. Then out_buffer - 1 would evaluate to 0xffffffffffff and appear to be larger than end, or cause a hardware exception. To be fair, in general the same value still ends up being stored in strm->avail_out, because both branches of the expression (out < end ? 257 + (end - out) : 257 - (out - end)) compute the same thing.
  2. Even without the array out_buffer being placed at address 0—which is rather unlikely—the compiler could suddenly become too smart for its own good and generate code that treats the condition out < end as false in this case. This has happened before. Key quote: “Some versions of gcc may silently discard certain checks for overflow. Applications compiled with these versions of gcc may be vulnerable to buffer overflows.” In this case, the invalid pointer out_buffer - 1 would be interfering with a compiler optimization, so anything might happen. Since it is strm->avail_out the statement is computing, a buffer overflow might result.

Fix

This problem and the previously described one have the same cause; they stem from the following optimization in inffast.c:

#ifdef POSTINC
#  define OFF 0
#  define PUP(a) *(a)++
#else
#  define OFF 1
#  define PUP(a) *++(a)
#endif

Defining POSTINC makes both the minor issues go away.

Conclusion

The defect identified here is related to the previous one, and like it, it is probably innocuous. Nevertheless, we initially set out to formally verify zlib, so we should stick to it: a formally verified zlib wouldn't compute negative offsets of arrays, much less compare them to valid offsets of the same array. So far, I have not been able to confirm any of the potential issues listed by Frama-C's value analysis in zlib with POSTINC defined, so that version of the library may reveal itself to be the formally verified version we are after.


This post was proofread by Fabrice Derepas and Olivier Gay.

Monday, January 14 2013

Why verify zlib?

As an interlude in the zlib verification thread, this post asks two questions. Is there any chance of finding a bug in zlib, and does it matter?

Could there be a bug in zlib?

It is not entirely impossible. The previous post in this blog pointed to a relatively minor issue in the zlib source code, and the next post will show a more serious (albeit still minor) one.

In July of 2005, a buffer overflow in zlib was found by Tavis Ormandy. Key quote: “An attacker could construct a malformed data stream, embedding it within network communication or an application file format, potentially resulting in the execution of arbitrary code when decoded by the application using the zlib library.”

If you have been following the zlib thread, this is exactly the context we are investigating: decompression of a maliciously crafted stream.

Does it matter?

In 2005, people did think it mattered. The CVE summary points out that zlib is involved when decompressing a PNG image (most users think of PNG images as innocuous objects that they should be allowed to look at regardless of origin without compromising the host computer). Other examples are the “Windows binary of Virtual Floppy Drive 2.1” and “Official Windows binaries of curl”. I do not think that list is exhaustive. Does the HTTP/1.1 protocol not allow for zlib-compressed data to be exchanged between client and server? A malicious client could cause a buffer overflow on the server with a simple request, and/or a malicious server could cause a buffer overflow in the client by replying to a request with specially crafted pretend-compressed data.

And a benchmark for the future

I am making a deliberate effort not to look at that bug now, but it goes without saying that when the verification of zlib is done, I will check that the same method find the bug that was in the pre-July-2005 version.

Thursday, January 10 2013

Code review finds minor issue in Zlib

In an article about comparing static analyzers (long-time readers of the blog, do not follow the link. It is still the same old article)…

Where was I? Ah, yes. One reason why it is extremely tricky to compare static analyzers is that a static analyzer is for identifying undefined behavior in C programs, but while some families of undefined behavior are extremely dangerous, C has many families of undefined behaviors. Some of them have been rather harmless until now, and programmers voluntarily use constructs that cause them. One example, briefly alluded to in a certain article, is the computation of an invalid offset of a pointer.

It is not alright for a C program to compute an invalid offset as in char t[50]; ...(t-1).... This is technically undefined behavior. Unfortunately, it is not alright for a C static analyzer to warn for this either, because the programmer is likely doing it on purpose. With some compilation options, the QuickLZ compression library does it (on purpose).

Frama-C's value analysis choice is to remain silent for the computation of t-1 but to warn if it is ultimately compared to another pointer—it could, for instance, equal &c against the programmer's expectations. Or, of course, the value analysis warns if the invalid pointer is dereferenced.

Et tu, zlib?

Here is the beginning of file inffast.c in library zlib:

#ifdef POSTINC
...
#else
#  define OFF 1
#  define PUP(a) *++(a)
#endif

When you see *++p in a C program, you can expect an invalid offset computation nearby. Indeed:

    in = strm->next_in - OFF;
    ...
    out = strm->next_out - OFF;

Zlib's documentation allows the user of the library to pass in strm->next_in and strm->next_out pointers to the beginning of allocated blocks. This can cause a minor undefined behavior for some inputs.

A “bug” not found by static analysis

Nothing would be easier than implementing the detection of this undefined behavior in Frama-C, but seeing more and more code that invoke it does not convince me that programmers want to know about it. Anyway, I just stumbled on macro PUP as I was analyzing zlib.

Post-scriptum

A way to exhibit the undefined behavior is to use the input vector in a future post together with a more stringent instrumentation:

--- zlib-1.2.7/inffast.c	2010-04-19 06:16:23.000000000 +0200
+++ zlib-modified/inffast.c	2013-01-21 00:56:25.000000000 +0100
@@ -64,6 +64,9 @@
       requires strm->avail_out >= 258 for each loop to avoid checking for
       output space.
  */
+
+int printf(const char *fmt, ...);
+
 void ZLIB_INTERNAL inflate_fast(strm, start)
 z_streamp strm;
 unsigned start;         /* inflate()'s starting value for strm->avail_out */
@@ -98,6 +101,7 @@
     state = (struct inflate_state FAR *)strm->state;
     in = strm->next_in - OFF;
     last = in + (strm->avail_in - 5);
+    printf("strm->next_out=%p and zlib is about to subtract one from it\n", strm->next_out);
     out = strm->next_out - OFF;
     beg = out - (start - strm->avail_out);
     end = out + (strm->avail_out - 257);
@@ -316,6 +320,7 @@
     strm->next_in = in + OFF;
     strm->next_out = out + OFF;
     strm->avail_in = (unsigned)(in < last ? 5 + (last - in) : 5 - (in - last));
+    printf("out=%p, end=%p\n", out, end);
     strm->avail_out = (unsigned)(out < end ?
                                  257 + (end - out) : 257 - (out - end));
     state->hold = hold;

When I compile and execute I get:

$ gcc -I Downloads/zlib-modified/ zlib_UB.c Downloads/zlib-modified/libz.a
$ ./a.out 
out_buffer: 0x7fff59e83a34
strm->next_out=0x7fff59e83a34 and zlib is about to subtract one from it
out=0x7fff59e83a33, end=0x7fff59e83a5e

Tuesday, January 1 2013

Software obsolescence and security

A couple of months ago, I packed a computer into a brown plastic bag of the kind usually used for trash. I then carried the carefully packed computer down to the basement. Physically, the computer still works. It is still beautiful (it is an iMac G4). It has been with me for the last ten years: I bought it second-hand in December 2002 from my friend Damien Doligez. I have used it for both work and entertainment for several years, and then gave it to my computerwise-unsophisticated sister who has been using it since. The design, much friendlier than later iMacs, and even than earlier iMacs, evokes the Luxo Pixar character. This computer was built to last: after ten years, the white plastics are still white. I pick this insignificant detail not because I particularly care about the tint of my computing devices, but because I don't. This is not an accident. People took care of the details when designing this computer, all the way down to how it would look as it aged.

The iMac G4 came with the cutest matched loudspeakers. The integrated modem would not find much use nowadays, but the computer also has USB, firewire and ethernet ports, and Wi-Fi. The thinking behind the integrated modem remains avant-gardist even if the technology is now outdated: this computer was the hub of its buyer's future digital lifestyle. It was intended to be connected, and it was intended not to look like an untidy mess in your living room.


300px-IMac.jpg


I opened it a couple of times to upgrade the hard disk and optical drive. It reminded me of the glimpse I once took as a student of the insides of a Sun workstation. I don't think I could explain this particular feeling to someone who did not see the inside of a Sun workstation—or of an iMac. As a concrete example that does not do justice to the experience, the unused screw holes of the factory-installed optical drive I retrieved from the computer had been taped over, so that air would not be sucked through the drive, lest dust accumulate on the drive's door as years passed.


Unfortunately, some functions mostly everyone expects from a computer nowadays is to display e-mail and web pages, and this computer is no longer fit for this purpose. This isn't a case of running new software on old hardware. This has never worked, but one used to be able to use loved hi-tech gadgets way past obsolescence as long as one avoided the mistake of installing new software on them. For lustra, you could keep using the spreadsheet you were used to for accounting or your simple-but-sufficient word processor. Even games kept working!

The problem nowadays is that the option not to upgrade software is no longer available: software more than a few years old ceases to receive security patches. The iMac G4 in question cannot be upgraded past Mac OS X 10.4. On Mac OS X 10.4, Firefox cannot be upgraded past version 3.6. Firefox 3.6 received its last security update in March 2012 and has now joined the Great Repository in the Sky.


Here is an idea for 2013: how about making software that works? Of course, it would not automatically adapt to new protocols, but as long as the user was satisfied with the old ones, this software would simply keep working. It would not have to be constantly updated with security patches because it would have been made right the first time, just like hardware sometimes is.

I would not want to give the impression that I am unfairly picking on the fine developers of Firefox. Firefox 3.6 continued to be maintained long after Safari for OS X 10.4 had ceased to receive security updates. In fact, the TenFourFox project, a maintained variant of Firefox for precisely the kind of computer I took to the basement, is a testament to the power of Open Source Software. Unfortunately, there is no Adobe Flash support in TenFourFox, and I fear that this might complicate the life of a computerwise-unsophisticated user.


Acknowledgements: besides selling me his computer, Damien Doligez officially owned the first OS X-able mac I used, as a PhD student. Harry McCracken suggested to liven up this post with a picture. Gerhard Piezinger photographed an iMac very much like mine, and I used his picture.

Sunday, December 30 2012

December in Security

Robert Graham, of the blog Errata Security, predicts that “vulnerabilities in Acrobat Reader, Adobe Flash, and Java today […] will be announced and patched in 2013”.


As fate would have it, he could safely have included Internet Explorer 8 in his list of software products used by millions to process untrusted inputs.

Tuesday, December 18 2012

zlib progress: one comma misused

A few days ago, I announced that the world had been using an unverified zlib library for too long, and that we were going to fix this. This post is the first progress report. I have found a harmless undefined behavior in zlib and I have learnt something about the comma operator. This post only covers the latter aspect.

The comma operator

The comma operator can be used to build an expression e1, e2. When e1, e2 is evaluated, first e1 is evaluated (for its result and side-effects), then the result is discarded and e2 is evaluated.

The comma counts as a sequence point. Sequence points are the points in a C program in-between which it is illegal to both read and write the same memory location; for instance x++ + x is undefined behavior because there is no sequence point between the post-incrementation x++ and the use of the value of x. Coming back to the comma operator, since it counts as a sequence point, I thought this might allow writing (x++, y) + (y++, x), but I am not so sure about that now.

Easing the value analysis' job on zlib

Formally verifying such a library as zlib is difficult. Indeed, otherwise, considering its usefulness and fame, if it was easy, someone would already have advertized that they had formally verified it.

One verification plan that I wanted to try out on this library is the model-checking approach that we first experimented with last summer.

The words “model-checking” may evoke different things to different people. In this context, they mean the propagation of only unabstracted known-to-be-feasible states, so that if a division by zero is reached in the program, we know for sure that this dangerous division can be reached for some inputs. In other words, we are talking here about a technique to avoid false positives, in addition to avoiding false negatives as we always do.

There is a catch: the propagation of unabstracted states does not really scale to a codebase the size and complexity of zlib. My plan was to ease the job of Frama-C's value analysis by carefully re-introducing a little bit of abstraction. I intended to modify the zlib library so that the modified version has all the same opportunities to do something wrong with pointers as the original version, but the modified version always computes zero as output sequence. This would make many memory states that would otherwise have been different become identical, something that the value analysis efficiently recognizes and takes advantage of.

A beginner's mistake

And this is how I found myself modifying *output_pointer++ = expression_with_side_effects; so that it would only write 0 to the output buffer, but otherwise have all the same side-effects as the original. I chose to replace it with the statement below. And I was pretty pleased with myself.

  *output_pointer++ = expression_with_side_effects, 0;


Unfortunately, after one additional cycle of verification, it turned out that in the C syntax, assignment has higher precedence than comma. My modified statement was parsed as:

  (*output_pointer++ = expression_with_side_effects), 0;

It behaved exactly as the original statement in assigning non-zero values to the output buffer. I had instead intended:

  *output_pointer++ = (expression_with_side_effects, 0);

Wednesday, December 12 2012

Seriously, Go?

I used to be curious about the D programming language. D had been pitched to me as “C done right”. Even before I had time to look at it, though, someone on StackOverflow was having an issue that stemmed from constant floating-point expressions being evaluated at compile-time with different semantics than the run-time semantics.

The C language has this problem too. But I do not see the point of switching to a similar, supposedly improved language if it does not at least fix this sort of issue.


History, it turns out, is cyclical modulo alpha-renaming. Today in another question on StackOverflow, someone was having a hard time writing the IEEE 754 constant -0.0 as a literal in the new programming language Go.

Go was not pitched to me as a better C, but it is pitched as a good replacement for some of C's applications. I had been mildly curious since I heard about it. Call me easily disappointed, but here is what I learned from today's question:

  1. In the current Go implementation, if you write -0.0, you get +0.0.
  2. If you write var a = 0.0 * -1.0, variable a is still initialized to +0.0.
  3. But you can obtain -0.0 if you write var a = 0.0 followed by a *= -1.0.


So Go has egregious differences between the run-time and compile-time semantics used for floating-point computations, but on the plus side, it sucks at recognizing static floating-point computations.

- page 3 of 21 -