Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Formally verifying zlib

In a blog post earlier this year, John Regehr wonders when software verification will finally matter. He means “formal verification”, I am pretty sure. “Verification” is what practitioners of, say, the software development V-cycle have been doing for decades, and it has kept us safe for that long—at least, when it mattered most.

Formal verification on the other hand, despite having been used here and there, is not yet the big deal that it could be. You'd better read the post, I fear I may not be doing it justice.


In that post, John writes:

For example, I’d be happy to drop a proved-correct gzip into my Linux machine as long as it was respectably fast.


Okay, John, let us do this, and see how far we can go.


We'll start with zlib, that we are going to approach like we did QuickLZ, by verifying that the decompression feature is resilient to malicious inputs. I have described the first potential issue on this code review site. If you have an opinion on this potential issue, you can contribute by expressing it there (registering is a one-click process), or here in the comments if you prefer.


Edited to add: too late, Andrew Šveikauskas has already determined that our first alarm was a false positive, as you can see on codereview.stackexchange.com. Ah well… It would have been nice if out first alarm were a true positive, but that was unlikely.

In case you regret missing out on the fun, here is another one: when reaching line inffast.c:269, where from is computed as out - dist to be accessed next, what prevents variable out to point one byte into the output buffer and dist to be 2, or 3, or 19? I expect the library must guard against this in the code above, but this is the end of a long day and I did not find where. Also, this would typically be the kind of relational information that the value analysis fails to remember, so I am not promising there is a bug to find here.

Comments

1. On Friday, December 7 2012, 17:16 by John Regehr

Hi Pascal, as you might guess I'm a big fan of this effort.

It's too bad this kind of thing so often degenerates into difficult questions about reachability or use cases-- it would be so much easier to write the code slightly defensively in the first place.

Writing a quick fuzzer would be one way to look for cases where your maybe-false path is executed, but that doesn't scale to lots of libraries. I wonder if there is some way to leverage the setup work you did for your verification effort here in order to make it very easy to use Klee or another concolic tester to look for an input that executes this path?

2. On Friday, December 7 2012, 17:20 by John Regehr

If we assume for a moment that these issues are false alarms, can you show us what the ACSL for zlib looks like?

3. On Sunday, December 9 2012, 16:44 by pascal

Hello, John.

This attempt is still ongoing. I will have a better view of the issues when it is nearing completion. I do intend to eventually write up both the process and the results as a self-contained document.

The question of communicating what the value analysis knows to other techniques is one that deserves more investigation. I had nearly the same interrogation, about transmitting what the value analysis knows to a third-party human other than me.

In both cases, providing the main() function I wrote for the analysis, plus the alarm type and location is a start. Both Klee and a peer reviewer could make good use of these. But in the example first alarm, there was more info that I could have provided—and did eventually provide during the discussion with Andrew on codereview.stackexchange.com.

In fact, that first alarm that he resolved as a false one was one that could be avoided entirely with better global settings for the value analysis. I do not regret investigating it, because with his help, I now understand the library better; this should help avoid more false alarms as the verification progresses. I am now playing with settings that make the value analysis more precise in exchange for consuming more memory.