Tag - memcpy

Entries feed - Comments feed

Wednesday, July 25 2012

The restrict qualifier as an element of specification

An insightful interjection

Stephen Canon, whose explanations about the fine points of floating-point I have appreciated on StackOverflow, chimed in on the subject of the restrict qualifier. With minor edits for consistency with this blog's formatting:

Consider memcpy(); its arguments are declared restrict. This not only means that the source and destination pointers cannot be the same (which is what Pascal's "replacement" says), but also means that the buffers cannot overlap at all (i.e. the src pointer cannot be anywhere in the range [dst-(length-1), dst+(length-1)]).

restrict also has the virtue of making this contract apparent to your callers, as it's part of the API. People should know that they can't call memcpy() with overlapping buffers, and use memmove() instead. If the information is instead encoded in a cryptic line in the implementation, then it isn't obvious to a developer who intends to use the function.

Non-aliasing granularity

I noted that my not-really-serious proposal did not have the same granularity as the original, with both good and bad consequences. An advantage of my proposal is that it is possible to specify only that p does not alias with q and that r does not alias with t. As defined in C99, restrict is more concise, but this also means that there are some properties you might want to specify and that you cannot.

restrict as an element of specification

But I failed to realize that restrict, being a type qualifier (for the casual C/C++ programmer: like const), is part of the function prototype. I was only thinking of the opportunity for better code generation inside f(), but it would theoretically be possible to have compilers that warn about passing aliasing pointers to f().

That's the theory anyway. In practice the situation is even worse than for const. For const you get type-system level warnings about casts discarding the qualifier. The warnings are sometimes useful and sometimes annoying. For restrict I do not know of any tool that makes any check of proper use at the call site.

An alternative

One tool that comes close to checking for proper use of functions with const and restrict-like specifications is developed nearby (have you heard of it?). One drawback is that it is work in progress. And it uses its own syntax for function specifications.

Here is the specification for memcpy() in that system:

/*@ requires \valid(((char*)dest)+(0..n-1))
  @          && \valid(((char*)src)+(0..n-1));
  @ requires \separated(((char *)dest)+(0..n-1),((char *)src)+(0..n-1));
  @ assigns ((char*)dest)[0..n-1] \from ((char*)src)[0..n-1];
  @ ensures memcmp((char*)dest,(char*)src,n) == 0 && \result == dest;
extern void *memcpy(void *restrict dest,
                    const void *restrict src, size_t n);

Yes, it is longer than just sprinkling the prototype with const and restrict, and I did not even include the definition of the logic predicate memcmp. But on the other hand, what you have here is a complete specification of memcpy(). You can ask your worst enemy to write a function to this specification, and ey proves that eir implementation satisfies the specification, you can rest assured that, for all valid inputs, the function terminates and computes the right result. The worst your enemy could do would be to provide you with a function that is arbitrarily slower than necessary.

Side note: solutions to the problem of your worst enemy providing you with a function that is too slow are being researched, but even researchers working on that would agree that this field has not reached the maturity of the solutions to the problem of your worst enemy providing you with a function that functionally does the wrong thing.

I can provide a neat trick for verifying that a function executes in constant time, but this is only a very partial solution to the problem.

Enough about this hypothetical worst enemy. The const part of memcpy()'s specification, and more, is contained in the clause:

assigns ((char*)dest)[0..n-1] \from ((char*)src)[0..n-1];

The restrict part is contained in:

requires \separated(((char *)dest)+(0..n-1),((char *)src)+(0..n-1));

I might be biased, but I think it is just as clear, and separate concerns are better separated. The \separated predicate, for instance, is also useful for expressing exactly the conditions under which the statement (*p=*p) & (*q=*q); is defined.

The ACSL by example library contains more examples of function contracts.

Wednesday, March 2 2011

On memcpy (part 2: the OCaml source code approach)

When picking up the title for the previous post on function memcpy, I anticipated this second part would describe the Frama_C_memcpy built-in function. The subtitle "part 1: the source code approach" seemed a good idea, since the first part was about using C source code to tell the analyzer what this function memcpy it often encounters is.

I did not think further, and did not realize that I would spend this sequel post explaining that a built-in function is an OCaml function that directly transforms an abstract state into another abstract state without going through the tedious, instruction-by-instruction interpretation of C code. So this second post is still about source code.

It is not source code that it is recommended, or even documented how, to write yourself. But it is a convenient way for value analysis experts to introduce certain new features.

What new features? Smoking hot, blazing fast new features.

Also, features that work completely right neither the first nor the second time. The Formula One of features, if you will. In fact, you could be forgiven for calling them Virgin Racing features, although I prefer to think of them as Ferrari features.

As just stated, builtins are about using OCaml functions to describe the effect of an unavailable C function. For instance, calls to Frama_C_memcpy(d,s,l) are interpreted with an OCaml function that, in a single step, copies a slice of memory of length l*8 bits from s to d. And it is fast. Behold:

char s[10000], d[10000];

  // Make problem representative
  for (int i=0; i<100; i++)
    s[100*i] = i;

  // Time one or the other
  //  c_memcpy(d, s, 10000);
  Frama_C_memcpy(d, s, 10000);

  // Check that d contains the right thing
  return 0;

(download the entire program)

Using the C version of memcpy and sufficient unrolling, you get the expected contents for d:

$ time bin/toplevel.opt -val -slevel 20000 memcpytest.c -no-results
        d[0..99] ∈ {0; }
         [100] ∈ {1; }
         [101..199] ∈ {0; }
         [200] ∈ {2; }
user   0m23.155s

Using the Frama_C_memcpy built-in, you get the same precise result for d:

$ time bin/toplevel.opt -val -slevel 20000 memcpytest.c -no-results
        d[0..99] ∈ {0; }
         [100] ∈ {1; }
         [101..199] ∈ {0; }
         [200] ∈ {2; }
user   0m0.402s

You also get nearly 23 seconds of your life back.

So what are the things that can go wrong with a builtin replacement function?

  1. If the C code that is not analyzed would have caused an alarm to be emitted, the built-in should emit that alarm too, or at the very least be provided with a header file that offers an ACSL pre-condition along with a prototype for the builtin. Both alternatives work, but it is easy to forget to do either.
  2. All cases of imprecise arguments should be handled. Support for an imprecise length was added recently. Very imprecise source or destination pointers may still cause the analysis to abort (it is being done and I am not sure where we were the last time we were working on this).
  3. The builtin function calls the value analysis' internal functions in contexts that differ from the usual, well tested circumstances occurring during the analysis of normal C programs. Strange behaviors or bugs may be uncovered by the user. As an example, I was just preparing the next section of this post, where I would have shown how imprecise lengths were now handled. I was going to suggest modifying the above test program, thus:
  Frama_C_memcpy(d, s, Frama_C_nondet(150,10000));

This brought up the following two remarks: first, there is an internal function for moving slices of the abstract state around that is slower than it should be, and it shows on this test. Secondly, the value analysis in Carbon will tell you that bits 1200 to 79999 of d contain either 0 or a very large number, a number with about 23400 decimal digits that start with 2164197 and end with 85824. The value analysis is right, of course. If the length argument to Frama_C_memcpy was 150, then that slice of memory contains zero. If the length was 10000, the numbers in d that were copied from s can be interpreted as representing exactly that 78800-bit number. This is the value analysis' way of telling you that it's either one or the other: either the length was 150 or it was 10000. This is more informative than telling you cell by cell that the cell contains zero or another number, because then you wouldn't see that it's either all one or all the other. The value analysis is rather cute that way.

Note: if the 78800-bit number explanation seems terribly complicated, do not worry, there will be a better explanation for this issue in a future post. We just need another new feature in order to be able to show you.

In conclusion, not everyone may want a Formula One car. But Frama_C_memcpy should at least work for completely unrolled deterministic programs, where it can speed up analysis quite a bit compared to the interpretation of a C memcpy.

Thanks to Julien for help with motorsports metaphors, and to Boris for help in improving Frama_C_memcpy (it used to be worse).

Saturday, January 29 2011

On memcpy (part 1: the source code approach)

memcpy() is one of the few functions standardized as part of C itself instead of an additional API. But that's not what makes it interesting from a static analysis point of view. I think what makes it interesting is that it is used often, and often for tasks that can be described in high-level terms, whereas its implementations range from the low-level to the horribly low-level.

Here is a typical memcpy implementation:

void *
memcpy(void *dst, void *src, size_t length)
  int i;
  unsigned char *dp = dst;
  unsigned char *sp = src;

  for (i=0; i<length; i++)
    *dp++ = *sp++;
  return dst;

And here are snippets of an analysis context that uses it (download the entire file):

struct S { int *p; int *q; int i; double d; int j; };
  struct S s, d;
  s.p = &a;
  s.q = Frama_C_nondet_ptr(&a, &b);
  s.i = Frama_C_interval(17, 42);
  s.j = 456;
  s.d = Frama_C_double_interval(3.0, 7.0);
  memcpy(&d, &s, sizeof(struct S));

Analyzing with frama-c -val .../builtin.c m.c produces the following information about the contents of s, which is exactly what we could have expected:

          s.p ∈ {{ &a ;}}
           .q ∈ {{ &a ; &b ;}}
           .i ∈ [17..42]
           .d ∈ [3. .. 7.]
           .j ∈ {456; }

On the other hand, the information for d is much less precise:

 d ∈ {{ garbled mix of &{a; b; } (origin: Misaligned {m.c:11; }) }} or UNINITIALIZED

Each field of d is only guaranteed to contain a value that was not computed from base addresses other than &a and &b. The fields of d are not even guaranteed to have been initialized. An ideally precise analysis would tell us that d contains the same values as those displayed for s.

Side remark: if you do run the analysis at home, you can ignore the messages about imprecise values for the moment. These are not alarms to act upon, but simply informational messages to help trace, when the value analysis is imprecise on large programs, where the imprecision starts.

If you have been reading this blog from the beginning, perhaps as a continuation from the tutorial, you may have noticed by now that whenever the value analysis is not precise enough, the first thing to try is option -slevel.

frama-c -val .../builtin.c m.c -slevel 100

          s.p ∈ {{ &a ;}}
           .q ∈ {{ &a ; &b ;}}
           .i ∈ [17..42]
           .d ∈ [3. .. 7.]
           .j ∈ {456; }
          d.p ∈ {{ &a ;}}
           .q[bits 0 to 7]# ∈ {{ &a ; &b ;}} misaligned 0%32 
           .q[bits 8 to 15]# ∈ {{ &a ; &b ;}} misaligned 0%32 
           .q[bits 16 to 23]# ∈ {{ &a ; &b ;}} misaligned 0%32 
           .q[bits 24 to 31]# ∈ {{ &a ; &b ;}} misaligned 0%32 
           .i[bits 0 to 7]# ∈ [17..42] misaligned 0%32 
           .i[bits 8 to 15]# ∈ [17..42] misaligned 0%32 
           .i[bits 16 to 23]# ∈ [17..42] misaligned 0%32 
           .i[bits 24 to 31]# ∈ [17..42] misaligned 0%32 
           .d[bits 0 to 7]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 8 to 15]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 16 to 23]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 24 to 31]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 32 to 39]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 40 to 47]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 48 to 55]# ∈ [3. .. 7.] misaligned 32%64 
           .d[bits 56 to 63]# ∈ [3. .. 7.] misaligned 32%64 
           .j ∈ {456; }

Good news: all fields of d are now guaranteed to be initialized. In addition, the information available for each field is more precise than before. The information about fields .j and .p has been perfectly preserved during the memcpy from s to d, whereas fields .q, .i and .d are only known to contain several 8-bit slices of values. This is what the "misaligned ..." part of each line mean.

For instance, line ".i[bits 0 to 7]# ∈ [17..42] misaligned 0%32" uses this mention in order to avoid giving the impression that bits 0 to 7 of field .i contain [17..42]. Instead, there was a 32-bit value [17..42] somewhere in memory, and a 8-bit slice of that value was copied to the first 8 bits or .i.

The fact that all 4 (respectively 8) slices of each field are followed by the same "misaligned" mention, with the same numerical values, mean that all slices for that field could come from the same value and have been copied in order, with the first source slice going to the first destination slice, ... However, the value analysis is not able to guarantee that this is what happened. It has lost the information that bits 0 to 7 and bits 8 to 15 of .i come from the same value, which is why the contents of .i are printed thus instead of just displaying that .i is [17..42].

It is important not to stitch together, say, bit 0 to 7 of {{ &a ; &b ;}} misaligned 0%32 and bit 8 to 31 of {{ &a ; &b ;}} misaligned 0%32 into {{ &a ; &b ;}} when you do not know that they come from the same value. Consider the example below:

  int a, b;
  int *p = Frama_C_nondet_ptr(&a, &b);
  int *q = Frama_C_nondet_ptr(&a, &b);
  *(char*)&p = *(char*)&q;

After analyzing this piece of code, the value analysis predicts for p:

          p[bits 0 to 7]# ∈ {{ &a ; &b ;}} misaligned 0%32 
           [bits 8 to 31]# ∈ {{ &a ; &b ;}} misaligned 0%32 

It would be incorrect to predict that p contains {{ &a ; &b ;}} here, because the first call to Frama_C_nondet_ptr may return &a while the second one returns &b. Without the information that the two slices originate from the same value, it is incorrect to stitch them together as if they did.

The value analysis does better when there is only one possible concrete value inside the set at hand. Fields .p and .j in the original program, although they were copied char by char, could be reconstituted because the value analysis knows that there is only one possible way to be {{ &a ; }} (or { 456; }), which implies that properly ordered slices when put side by side can only rebuild &a.

This memcpy, if it is called with large values of length in the program, and the program is analyzed with the -slevel option, may occupy a large fraction of the analysis time. This is because although the analyzer does not keep the information that the small bites of values come from the same value, which it could use, it keeps the information that there exist plenty of equalities between source and destination chars — and does not use them. This should be considered a known issue with an open-ended fix date.

And this performance issue naturally brings up another performance issue with which it is suitable to conclude this first part of the memcpy discussion. At the other end of the spectrum, what if the user finds the analysis of memcpy too slow even without -slevel (or with -slevel-function memcpy:0 to force a rough analysis of memcpy)?

To this user, we say: replace your memcpy function by the one below.

void *
memcpy(void *dst, void *src, size_t length)
  unsigned char *dp = dst;
  unsigned char *sp = src;
  for (;Frama_C_interval(0,1);)
      dp[Frama_C_interval(0,length-1)] = sp[Frama_C_interval(0,length-1)];
  return dst;

While it it may not be immediately clear why one would want to replace the original, executable memcpy with this non-executable one, we can at least notice that all the executions of the original memcpy are captured by this one. At each iteration of the original memcpy, the condition is either 0 or 1, and one of the characters sp[Frama_C_interval(0,length-1)] is copied to dp[Frama_C_interval(0,length-1)].

Now, to investigate why analyzing this function is faster, let us look at the states propagated by the analyzer when analyzing the for loop of the original memcpy. It starts the loop with:

        i ∈ {0; }
        dp ∈ {{ &d ;}}
        sp ∈ {{ &s ;}}
        d completely uninitialized

Here is the sequence of states at the point after another char has been copied, before dp, sp and i are incremented:

        i ∈ {0; }
        dp ∈ {{ &d ;}}
        sp ∈ {{ &s ;}}
        d.p[bits 0 to 7]# ∈ {{ &a ;}} misaligned 0%32 
         {.p[bits 8 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ {0; 1; }
        dp ∈ {{ &d + {0; 1; } ;}}
        sp ∈ {{ &s + {0; 1; } ;}}
        d.p[bits 0 to 15] ∈
         {{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.p[bits 16 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ {0; 1; 2; }
        dp ∈ {{ &d + {0; 1; 2; } ;}}
        sp ∈ {{ &s + {0; 1; 2; } ;}}
        d.p[bits 0 to 23] ∈
         {{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.p[bits 24 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ {0; 1; 2; 3; }
        dp ∈ {{ &d + {0; 1; 2; 3; } ;}}
        sp ∈ {{ &s + {0; 1; 2; 3; } ;}}
        d.p ∈
         {{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.q; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ {0; 1; 2; 3; 4; }
        dp ∈ {{ &d + {0; 1; 2; 3; 4; } ;}}
        sp ∈ {{ &s + {0; 1; 2; 3; 4; } ;}}
        d{.p; .q[bits 0 to 7]; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.q[bits 8 to 31]; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ [0..15]
        dp ∈ {{ &d + {0; 1; 2; 3; 4; 5; } ;}}
        sp ∈ {{ &s + {0; 1; 2; 3; 4; 5; } ;}}
        d{.p; .q[bits 0 to 15]; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.q[bits 16 to 31]; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ [0..16]
        dp ∈ {{ &d + {0; 1; 2; 3; 4; 5; 6; } ;}}
        sp ∈ {{ &s + {0; 1; 2; 3; 4; 5; 6; } ;}}
        d{.p; .q[bits 0 to 23]; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.q[bits 24 to 31]; .i; .d; .j; } ∈ UNINITIALIZED

        i ∈ [0..23]
        dp ∈ {{ &d + [0..7] ;}}
        sp ∈ {{ &s + [0..7] ;}}
        d{.p; .q; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.i; .d; .j; } ∈ UNINITIALIZED

        i ∈ [0..23]
        dp ∈ {{ &d + [0..8] ;}}
        sp ∈ {{ &s + [0..8] ;}}
        d{.p; .q; .i[bits 0 to 7]; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.i[bits 8 to 31]; .d; .j; } ∈ UNINITIALIZED

        i ∈ [0..23]
        dp ∈ {{ &d + [0..15] ;}}
        sp ∈ {{ &s + [0..15] ;}}
        d{.p; .q; .i; .d[bits 0 to 31]; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.d[bits 32 to 63]; .j; } ∈ UNINITIALIZED

        i ∈ [0..23]
        dp ∈ {{ &d + [0..16] ;}}
        sp ∈ {{ &s + [0..16] ;}}
        d{.p; .q; .i; .d[bits 0 to 39]; } ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
         {.d[bits 40 to 63]; .j; } ∈ UNINITIALIZED

        i ∈ [0..23]
        dp ∈ {{ &d + [0..23] ;}}
        sp ∈ {{ &s + [0..23] ;}}
        d ∈
         {{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED

In contrast with this long sequence of abstract states going through the same point of the for loop until a fixpoint is finally found, the analysis of the modified memcpy reaches the same abstract state in one iteration, realizes that it is a fixpoint and goes on with its life:

        dp ∈ {{ &d ;}}
        sp ∈ {{ &s ;}}
        d ∈
         {{ garbled mix of &{a; b; } (origin: Merge {mfast.c:32; }) }} or UNINITIALIZED

In short, the modified memcpy voluntarily loses information so that finding the loop's fixpoint becomes straightforward. The analysis of the original memcpy has to discover the fixpoint by gradually losing information and seeing what values stick. In fact, the process of discovering the fixpoint may overshoot, and end up less precise as well as slower than the improved memcpy. For imprecise analyses of programs that use memcpy, when the user is interested in the verification of the program itself rather than memcpy, it is recommended to use the replacement version.