Tag - restrict

Entries feed - Comments feed

Thursday, August 2 2012

restrict is not good for modular reasoning


There were quite a few posts recently that were concerned with ACSL, and a few more are planned for the near future. ACSL is a specification language for C (comparable with JML for Java, for those who know about JML). Some people call it a BISL, for “Behavioral Interface Specification Language”, but considering that this acronym's definition needs to be introduced everytime, on account of nobody in the world having ever seen it before, and that it has very little reuse potential, please forget that we mentioned it.

ACSL is Frama-C's annotation language, and it is partially implemented in Frama-C, but it is not intended to be just Frama-C's. According to this message, Microsoft Research's VCC now uses an annotation language with ACSL-like syntax and semantics.

ACSL allows to write function contracts, the stuff that modular reasoning about programs is made of. A function contract is supposed to tell you everything you need to know about the function, so that you do not need to look at its code. The function might still be unimplemented, even.

ACSL also allows to write annotations that go inside the function and are specific to one particular implementation. These annotations are typically hints to allow the verification that the specific implementation satisfies the contract. They can be, for instance, loop invariants or assertions.

It is good on several levels to be able to write and read properties about a C program in a formal language with unambiguous meaning. A compiler could parse and interpret ACSL to obtain information about the programmer's intentions, and take advantage of the opportunity to generate faster code. Hey, compiler makers, I said FASTER CODE.

The restrict type qualifier, again

Recently Stephen Canon reminded me that the good thing about restrict is that it is part of the function prototype, and I showed the differences with ACSL's approach.

Why does ACSL introduce an new notion, the \separated predicate, where C99 already has restrict? One reason that I pointed out is that while restrict offers a very concise syntax for some specialized uses, \separated is useful for other things that are not convenient to express with restrict.

Another reason is that a C prototype with restrict annotations still does not work very well as a contract. The restrict qualifier in the prototype implicitly refers to the function's code. This blog post expands on this.

The meaning of restrict

The restrict qualifier applied to a pointer more or less means that any offset of the pointer can only alias with an offset of the pointer. Please read C99's section for the exact definition. I will simply quote the example from paragraph 8 there:


The function parameter declarations in the following example

void f(int n, int * restrict p, int * restrict q)
  while (n-- > 0) 
  *p++ = *q++; 

assert that, during each execution of the function, if an object is accessed through one of the pointer parameters, then it is not also accessed through the other.


Note that the notion of any offset of a pointer p being separated from everything else that is not also an offset of p is a relative notion. In the absolute, p has to point somewhere. If for instance p points inside a global array t, then p aliases with t. The restrict qualifier only means that locally, inside the function f, for what f does, an offset of p does not alias with any lvalue not based on p. As the next paragraph in the standard puts it:


The benefit of the restrict qualifiers is that they enable a translator to make an effective dependence analysis of function f without examining any of the calls of f in the program. The cost is that the programmer has to examine all of those calls to ensure that none give undefined behavior. For example, the second call of f in g has undefined behavior because each of d[1] through d[49] is accessed through both p and q.

void g(void) 
  extern int d[100]; 
  f(50, d + 50, d); //valid 
  f(50, d + 1, d); //undefined behavior 


The call f(50, d + 50, d) is valid because although offsets of d+50 can obviously alias with offsets of d, they do not alias inside the function for what the function does. When calling f(50, d + 1, d);, offsets of the two restrict-qualified pointers do alias, and the call invokes undefined behavior.

No modular reasoning

And this sums up why restrict alone would not be good for modular reasoning about programs. It is part of the prototype, and the prototype is a kind of limited contract, but the exact meaning of restrict depends on the function's implementation. If the function computes p + 49, then p + 49 is part of the offsets of p that must not alias with the rest of the memory locations accessed by f. If f accesses the global variable G by name, then p must not point to G. if f does not access G by name, then p is allowed to point to G. It is only one particular implementation of f that gives meaning to restrict.

By contrast, a proper contract is supposed to be self-contained. The ACSL precondition \separated(p+(0..n-1), q+(0..n-1)); tells the programmer exactly how the arguments p and q are allowed to alias. An implementation for f needs not exist for the programmer to start using it with the confidence that ey is calling f properly.

Wednesday, July 25 2012

The restrict qualifier as an element of specification

An insightful interjection

Stephen Canon, whose explanations about the fine points of floating-point I have appreciated on StackOverflow, chimed in on the subject of the restrict qualifier. With minor edits for consistency with this blog's formatting:

Consider memcpy(); its arguments are declared restrict. This not only means that the source and destination pointers cannot be the same (which is what Pascal's "replacement" says), but also means that the buffers cannot overlap at all (i.e. the src pointer cannot be anywhere in the range [dst-(length-1), dst+(length-1)]).

restrict also has the virtue of making this contract apparent to your callers, as it's part of the API. People should know that they can't call memcpy() with overlapping buffers, and use memmove() instead. If the information is instead encoded in a cryptic line in the implementation, then it isn't obvious to a developer who intends to use the function.

Non-aliasing granularity

I noted that my not-really-serious proposal did not have the same granularity as the original, with both good and bad consequences. An advantage of my proposal is that it is possible to specify only that p does not alias with q and that r does not alias with t. As defined in C99, restrict is more concise, but this also means that there are some properties you might want to specify and that you cannot.

restrict as an element of specification

But I failed to realize that restrict, being a type qualifier (for the casual C/C++ programmer: like const), is part of the function prototype. I was only thinking of the opportunity for better code generation inside f(), but it would theoretically be possible to have compilers that warn about passing aliasing pointers to f().

That's the theory anyway. In practice the situation is even worse than for const. For const you get type-system level warnings about casts discarding the qualifier. The warnings are sometimes useful and sometimes annoying. For restrict I do not know of any tool that makes any check of proper use at the call site.

An alternative

One tool that comes close to checking for proper use of functions with const and restrict-like specifications is developed nearby (have you heard of it?). One drawback is that it is work in progress. And it uses its own syntax for function specifications.

Here is the specification for memcpy() in that system:

/*@ requires \valid(((char*)dest)+(0..n-1))
  @          && \valid(((char*)src)+(0..n-1));
  @ requires \separated(((char *)dest)+(0..n-1),((char *)src)+(0..n-1));
  @ assigns ((char*)dest)[0..n-1] \from ((char*)src)[0..n-1];
  @ ensures memcmp((char*)dest,(char*)src,n) == 0 && \result == dest;
extern void *memcpy(void *restrict dest,
                    const void *restrict src, size_t n);

Yes, it is longer than just sprinkling the prototype with const and restrict, and I did not even include the definition of the logic predicate memcmp. But on the other hand, what you have here is a complete specification of memcpy(). You can ask your worst enemy to write a function to this specification, and ey proves that eir implementation satisfies the specification, you can rest assured that, for all valid inputs, the function terminates and computes the right result. The worst your enemy could do would be to provide you with a function that is arbitrarily slower than necessary.

Side note: solutions to the problem of your worst enemy providing you with a function that is too slow are being researched, but even researchers working on that would agree that this field has not reached the maturity of the solutions to the problem of your worst enemy providing you with a function that functionally does the wrong thing.

I can provide a neat trick for verifying that a function executes in constant time, but this is only a very partial solution to the problem.

Enough about this hypothetical worst enemy. The const part of memcpy()'s specification, and more, is contained in the clause:

assigns ((char*)dest)[0..n-1] \from ((char*)src)[0..n-1];

The restrict part is contained in:

requires \separated(((char *)dest)+(0..n-1),((char *)src)+(0..n-1));

I might be biased, but I think it is just as clear, and separate concerns are better separated. The \separated predicate, for instance, is also useful for expressing exactly the conditions under which the statement (*p=*p) & (*q=*q); is defined.

The ACSL by example library contains more examples of function contracts.

On the redundancy of C99's restrict

The restrict keyword in C99

C99 introduced a restrict keyword. The intention is to let the programmer specify the absence of alias between some inputs of a function ey is writing.

Consider the function:

int f1(int * restrict p, int * restrict q)
  *p = 1;
  *q = 2;
  return *p + *q;

Thanks to the restrict keyword, GCC can compile function f1() into the following assembly code:

$ ~/gcc-172652/bin/gcc -O3 -std=c99 -S restr.c
$ cat restr.s
	movl	$1, (%rdi)
	movl	$2, (%rsi)
	movl	$3, %eax

Note that the -std=c99 is necessary. The code, parsed as a C89 program, is syntactically incorrect because restrict is not a keyword in C89.

In the generated x86_64 code, %rdi is the first argument p, %rsi is the second argument q, the parentheses dereference and the dollar sign indicates constants. By convention, register %eax contains the return value when the ret instruction is reached.

The above is pretty good code. Since the information is provided that pointers passed to function f1 for p and q never alias, the compiler knows that at the point of the return, *p must be 1, and therefore the returned expression evaluates to 3.

Contrast with the same function compiled without the restrict annotations, say, because all you have is a C89 compiler:

int f(int * p, int * q)
  *p = 1;
  *q = 2;
  return *p + *q;

	movl	$1, (%rdi)
	movl	$2, (%rsi)
	movl	(%rdi), %eax
	addl	$2, %eax

This generated assembly code is not as good. The compiler knows that *q can only be 2 at the point of the return statement, but it cannot be certain that *p contains 1, because the function could be passed the same address for both arguments. In order for the code to work in all cases, the compiler decided to reload *p from memory for the addition.

My proposal

I claim that the restrict keyword in C99 both lacks expressive power and is redundant with constructs that already existed in C89. There was no need to encumber the language with a new construct, especially a new keyword that breaks the compilation of existing code that uses a variable named restrict.

C89 already had undefined behavior, a powerful tool for instructing the compiler that it does not have to generate code that handles special cases.

A function f2() that works under the same hypotheses as f1() can be written in C89:

int f2(int * p, int * q)
  (*p=*p) & (*q=*q);
  *p = 1;
  *q = 2;
  return *p + *q;

The first statement in the body informs the compiler that f2() will never be called with aliasing arguments p and q. The compiler is then free to generate the same code as was generated for the restrict-using function f1(). It is completely legal for the compiler to assume that *p + *q at the return statement evaluates to 3 in the function above. GCC does not generate efficient code for the function f2() above, but it could. It generates the same code as for the plain function f(), correctly compiling (*p=*p) & (*q=*q); to nothing, but failing to take advantage of the freedom it provides to generate a function that does not work when p and q alias.

Wrapping up

My proposal is more flexible than restrict, allowing to specify fine-grained non-aliasing relations between specific pointers. It is slightly more verbose, especially when there are many pointers, but this is a normal downside of allowing to specify aliasing relations pointer by pointer. Note that the pairwise absence of alias between p, q and r can be specified with the compact (*p=*p) & (*q=*q) & (*r=*r);

Compiler makers, please support the (*p=*p) & (*q=*q); code annotation in your compilers.