SSL in the news
SSL is a protocol for point-to-point confidential and authenticated communication over an insecure medium. It is the protocol behind HTTPS, among many other uses. In an Internet-connected system, the SSL implementation stands at the frontier between the system and the hostile outside world. For this reason, SSL implementation flaws are a prime target for attacks.
An ordinary bug
A rather banal SSL implementation bug was revealed over the weekend. A duplicated line in a crucial, if ordinary-looking, sequence of validation operations means that some of the validation steps are not taken:
... if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; // <------------------------------------------------- if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; ... fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; }
Note that the second consecutive
goto fail is executed with variable
err containing the result of the last validation operation,
0, indicating success. This value is returned as-is by the function; the caller is therefore mislead into believing that the validation succeeded, despite some of the validation steps not having been executed.
Because C lacks an exception mechanism, the above is an often-seen programming pattern. The programming style here can hardly be blamed: this is how the best C programs are written. Except of course for the extraneous
The SSL implementation, and thus the bug in question, are found in tens of millions of iOS devices, with a few additional Mac OS X computers on top of that. The scope of the security problem caused by this bug, and the obviousness of the issue when pointed out, have together lead to much commentary on Twitter and other popular discussion forums.
So reaction. Very noise
“I would never have had this problem because I know that
goto is bad”, some commenters claim. I wish I was caricaturing, but I am unfortunately only paraphrasing and combining several public reactions into one.
“I would never have had this problem because I use braces”, essentially state others. Certainly, the root cause of the problem must have been that the developer who introduced the bug was confused about the meaning of
if (c) stmt1; stmt2;. Just look how ey indented it!
These two often-heard remarks strongly suggest to use brace-less control flow or the presence of
goto as predictors of defects in C code. I am sure this will be a fruitful research direction. Someone from the software engineering academic community should look into it.
“Just adding a single line of code can bring a system to its knees”, reminds Arie van Deursen. True, true, an important lesson there. We should all insert the following line somewhere in our respective codebases from time to time and take a good look at the effect it has, in remembrance.
It is Ariane 5 all over again! Worse, instead of just the makers of formal verification systems, everyone seems to have a scheme to prevent the problem they already know is there.
An interesting bug in a different SSL implementation
The problem in most of the armchair analysis that has been going on on the Internet lies in the two following questions: how many more bugs in security-critical C code does the proposed remedy (be it more braces, fewer gotos, …) find? How many time-costly, tedious, soul-crushingly boring false positives does it uncover?
Since commenters have been generalizing on the basis of a population of one sample, I feel no shame in presenting my own single example, raising the total number of scrutinized bugs to two. Note that, for us to make statistically sound arguments, we will eventually need to extend the discussion to examples of correct code that we do not wish to change.
Until then, here is a funny SSL implementation bug. It is characterized as follows, in a version of PolarSSL 1.1.8 that colleagues and I have been modifying.
Screenshot 1: an alarm in our
PolarSSL expects the program in which it is incorporated to provide it with a function that receives data from the outside world and writes it to a memory buffer. We have made such a function, baptized it
tis_recv, and we have set up PolarSSL to use it.
tis_recv takes three arguments. The first one is a context argument in case someone needs one (our function ignores this argument). Second is a pointer to the buffer where the data is to be written, then third is the maximum size the function is allowed to write there.
We have specified our function
/*@ requires recv_r1: \valid(output+(0..output_len-1)) ; requires recv_r2: output_len > 0 ; */ int tis_recv(void* p, unsigned char * output, size_t output_len);
The screenshot indicates on the bottom right that the pre-condition
recv_r1, which states that the argument
output points to a buffer large enough for
output_len characters, is not verified. How could this be? Surely a false positive… Or someone is calling the function with the wrong arguments. It does not look like the problem is in our function.
The GUI informs us that the function
tis_recv is called in one place only, and that place is inside
ssl_fetch_input(). It is called through a function pointer stored inside a member of a struct accessed through a pointer. The GUI tells use that we can mentally substitute
Screenshot 2: a wrong third argument
The GUI tells us that the buffer that PolarSSL passes to
tis_recv() has size 16KiB-ish (not pictured), and that the variable
len passed as third argument appears to take any of the values of its
size_t type (pictured in the bottom panel below).
Screenshot 3: jumping back to the origin of the value
We inquire where the value of variable
len was last set, and the GUI tells us it is at the yellow line just above the function call (pictured, in the middle panel). Well, hum, yes, we could have noticed that, but it was faster to click.
Screenshot 4: argument
nb_want is too large
The value of
len is computed from
nb_want, and the value of
nb_want appears to be too large,
65540, for the size of the buffer that the GUI tells us we have (in the screenshot below, the values computed as possible for
nb_want are displayed in the bottom panel).
Screenshot 5: dangerous values come from caller
A new possibility offered by the Frama-C version I am using that may not even(*) be available in the latest release Fluorine is to observe, in the bottom panel, which call-sites and originating call-stacks cause which values to occur in a variable. Here, the GUI shows that
nb_want appears to be
65540 when called from function
ssl_read_record() at line 1178 in file ssl_tls.c of PolarSSL. This means we can continue the investigation there.
In contrast, the value of
nb_want can only be
ssl_fetch_input() is called from
ssl_parse_client_key_exchange(), so there is no need to look at that function: it is definitely not part of this problem.
(*) I don't remember. It has been a long time, has it not?
Screenshot 6: the seemingly too large argument is
The problem is that
ssl->in_msglen is too large here. But where does it come from?
ssl->in_msglen has been computed from two bytes sent by the network (bad, bad network). But the value of
ssl->in_msglen is supposed to have been validated before being used. This is what the lines between the obtention of the value and its use are supposed to do.
The value of
ssl->in_msglen is validated when
ssl->minor_ver is 0, and it is validated when
ssl->minor_ver is 1. But according to the GUI,
ssl->minor_ver can be any of
At this point it is only a matter of confirming that the call to
ssl_read_record() can indeed be reached with
ssl->minor_ver set to 2. This is where one switches to existential(*) mode, possibly crafting a malicious message, or simply building a convincing argument that values can converge here to cause bad things and send it to the official maintainer .
When I said that this was a modified PolarSSL 1.1.8 we were analyzing, I cheated a little. This is a 1.1.8 version in which I have re-introduced a bug that was there from 1.0.0 to 1.1.7. The principal maintainer of PolarSSL suggests to fix the bug by replacing
== SSL_MINOR_VERSION_1 by
(*) I like to think of this mode as “Bill Murray mode”, although you wouldn't see me say this out loud or put it in writing.
We have seen a second example of a real bug that can occur in an SSL implementation. Understandably in the current context, there has been much speculation over the last 48 hours on the innocence of the bug in Apple's implementation. Might it be a voluntarily inserted backdoor? Is the NSA behind this bug? (I link here to John Gruber's post because he writes clearly, but the same question is raised all over the Internet).
Allow me to put it this way: if the Apple SSL bug is a coup from the NSA, then you US citizens are lucky. Our spy agency in Europe is so much better that it does not even have a name you have heard before, and it is able to plant bugs where the buffer overflow leading to arbitrary code execution is three function calls away from the actual bug. The bug from our spy agency is so deniable that the code actually used to be fine when there were only two minor revisions of the SSL protocol. The backdoors from your spy agency are so lame that the Internet has opinions about them.
Real bugs come in all sizes and shapes. That one mistake with security implication also causes easily detectable unreachable code or wrongly-indented lines does not mean that all security flaws will be detected so easily, and that plenty of quirky but functionally correct code will not be wrongly flagged.
Speaking of varied bugs, “what about PolarSSL version 1.1.8 without the manually re-inserted bug from CVE-2013-5914?”, I hear you ask. This will be the subject of another blog post.
Philippe Herrmann was first to use Frama-C to find a security bug in PolarSSL (Philippe's bug was a denial of service in versions up to 1.1.1, fixed in 1.1.2). Anne Pacalet and Paul Bakker have helped me with various aspects of PolarSSL's verification, including but not limited to the bug described in this post. Twitter users aloria, thegrugq, and johnregehr provided comments on drafts of this post. And finally, the Internet made this post possible by being itself.