zlib progress: one comma misused
By pascal on Tuesday, December 18 2012, 10:24 - Permalink
A few days ago, I announced that the world had been using an unverified zlib library for too long, and that we were going to fix this. This post is the first progress report. I have found a harmless undefined behavior in zlib and I have learnt something about the comma operator. This post only covers the latter aspect.
The comma operator
The comma operator can be used to build an expression
e1, e2. When
e1, e2 is evaluated, first
e1 is evaluated (for its result and side-effects), then the result is discarded and
e2 is evaluated.
The comma counts as a sequence point. Sequence points are the points in a C program in-between which it is illegal to both read and write the same memory location; for instance
x++ + xis undefined behavior because there is no sequence point between the post-incrementation
x++and the use of the value of
x. Coming back to the comma operator, since it counts as a sequence point, I thought this might allow writing
(x++, y) + (y++, x), but I am not so sure about that now.
Easing the value analysis' job on zlib
Formally verifying such a library as zlib is difficult. Indeed, otherwise, considering its usefulness and fame, if it was easy, someone would already have advertized that they had formally verified it.
One verification plan that I wanted to try out on this library is the model-checking approach that we first experimented with last summer.
The words “model-checking” may evoke different things to different people. In this context, they mean the propagation of only unabstracted known-to-be-feasible states, so that if a division by zero is reached in the program, we know for sure that this dangerous division can be reached for some inputs. In other words, we are talking here about a technique to avoid false positives, in addition to avoiding false negatives as we always do.
There is a catch: the propagation of unabstracted states does not really scale to a codebase the size and complexity of zlib. My plan was to ease the job of Frama-C's value analysis by carefully re-introducing a little bit of abstraction. I intended to modify the zlib library so that the modified version has all the same opportunities to do something wrong with pointers as the original version, but the modified version always computes zero as output sequence. This would make many memory states that would otherwise have been different become identical, something that the value analysis efficiently recognizes and takes advantage of.
A beginner's mistake
And this is how I found myself modifying
*output_pointer++ = expression_with_side_effects; so that it would only write
0 to the output buffer, but otherwise have all the same side-effects as the original. I chose to replace it with the statement below.
And I was pretty pleased with myself.
*output_pointer++ = expression_with_side_effects, 0;
Unfortunately, after one additional cycle of verification, it turned out that in the C syntax, assignment has higher precedence than comma. My modified statement was parsed as:
(*output_pointer++ = expression_with_side_effects), 0;
It behaved exactly as the original statement in assigning non-zero values to the output buffer. I had instead intended:
*output_pointer++ = (expression_with_side_effects, 0);