Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Attack by Compiler

The title of this post, “Attack by Compiler”, has been at the back of my mind for several weeks. It started with a comment by jduck on a post earlier this year. The post's topic, the practical undefinedness of reading from uninitialized memory, and jduck's comment, awakened memories from a 2008 incident with the random number generator in OpenSSL.

As I am writing this, if I google “attack by compiler”, the first page of results include the classic essay Reflections on Trusting Trust by Ken Thompson, Wikipedia's definition of a backdoor in computing, an article by David A. Wheeler for countering the attack described by Thompson, a commentary by Bruce Schneier on Wheeler's article, and a Linux Journal article by David Maynor on the practicality of the attack described by Thompson on the widespread GNU/Linux platform.


This post is about a slightly different idea.

Initial conditions: trustworthy compiler, peer-reviewed changes

Suppose that we start with a trustworthy compiler, widely distributed in both source and binary form. Some people are in the position to make changes to the compiler's source code and could, like Thompson in his essay, attempt to insert a backdoor in the compiler itself.

But now, each change is scrutinized by innumerable witnesses from the Open-Source community. I say “witnesses”, in fact they are mostly students, but we will probably be forced to assume that they can't all be mischievous.

The attackers could try to obfuscate the backdoor as they insert it, but the problem is that some of the witnesses are bound to possess this character trait that the less they understand a change, the more they investigate it. Furthermore, once these witnesses have uncovered the truth, loss of credibility will ensue for the person who tried to sneak the backdoor in. This person will lose eir commit privilege to the compiler's sources, and people will recover untainted compilers in source and binary form from their archives. This kind of approach is risky and may only result in a temporary advantage—which may still be enough.

The underhanded approach

The 2013 edition of the Underhanded C Contest is under way. The contest defines underhanded code as:

code that is as readable, clear, innocent and straightforward as possible, and yet [fails] to perform at its apparent function

Underhandedness is exactly what an attacker with commit access to the source code of the widely used compiler should aim for. If the commit is underhanded enough, the committer may not only enjoy full deniability, but ey may obtain that the incriminating change stays in ulterior versions of the compiler, as a “good” change. This implies that all affected security-sensitive applications, like the “login” program in Thompson's essay, must be updated to work around the now official backdoor in the compiler. In this scenario, even after the attack has been discovered, anytime someone unknowingly compiles an old version of “login” with a recent compiler, it's another win for the attacker.


Fortunately, we agree with Scott Craver that the C programming language is a very good context to be underhanded in, and a C compiler is even better. How about the following ideas?

  1. making pseudo-random number generators that rely on uninitialized memory less random, in the hope that this will result in weaker cryptographic keys for those who do not know about the flaw;
  2. optimizing a NULL test out of kernel code when it is one of several defenses that need to be bypassed;
  3. optimizing buffer + len >= buffer_end || buffer + len < buffer overflow tests out from application code, so that buffer overflows do take place in code that is guarded thus;
  4. optimizing source code that was written to take constant-time into binary code that reveals secrets by terminating early.

I am not being very original. According to this post by Xi Wang, idea (1) is only waiting for someone to give the compiler one last well-calibrated shove. The NULL test optimization was already implemented in the compiler when it was needed for a famous Linux kernel exploit. The interesting scenario would have been if someone had found that the code in tun_chr_poll() was almost exploitable and had submitted the GCC optimization to activate the problem, but it did not happen in this order. Idea (3) really happened.

Idea (4) has not been exploited that I know of, but it is only only a matter of time. If I google for “constant-time memcmp”, I may find an implementation such as follows:

int memcmp_nta(const void *cs, const void *ct, size_t count)
{
  const unsigned char *su1, *su2;
  int res = 0;

  for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
  res |= (*su1 ^ *su2);

  return res;
}

Nothing in the C language definition forces a compiler to compile the above function into a function that does not return as soon as variable res contains (unsigned char)-1, not to mention the possibilities if the compiler first inlines the function in a site where its result is only compared to 0, and then optimizes the code for early termination. If I was trying to sneak in a compiler change that defeats the purpose of this memcmp_nta() function, I would bundle it with auto-vectorization improvements. It is a fashionable topic, and quite exciting if one does not care about non-functional properties such as execution time.

Conclusion

The impracticability of the described attack is counter-balanced by some unusual benefits: at the time of the attack, someone may already have audited the pseudo-random number generator or function memcmp_nta() we used as examples. The audit may have considered both source and generated assembly code, and involved actual tests, at a time when the code was “safely” compiled. But the auditor is not going to come back to the review again and again each time a new compiler comes out. Like Monty Python's Spanish Inquisition, nobody expects the compiler-generated backdoor.


Three of my four examples involve undefined behavior. More generally, my examples all involve unwarranted programmer expectations about C idioms. This is the key to plausibly deniable compiler changes that reveal latent security flaws. What other unwarranted expectations should we take advantage of for an “attack by compiler”?