← Fun with constants Linux and floating-point: nearly there →

Better is the enemy of good... sometimes

Pascal Cuoq - 12th Sep 2011

This post is about widening. This technique was shown in the second part of a previous post about memcpy() where it was laboriously used to analyze imprecisely function memcpy() as it is usually written. The value analysis in Frama-C has the ability to summarize loops in less time than they would take to execute. Indeed it can analyze in finite time loops for which it is not clear whether they terminate at all. This is the result of widening the voluntary loss of precision in loops. As a side-effect widening can produce some counter-intuitive results. This post is about these counter-intuitive results.

The expected: better information in better information out

Users generally expect that the more you know about the memory state before the interpretation of a piece of program the more you should know about the memory state after this piece of program. Users are right to expect this and indeed it is generally true:

int u; 
main(){ 
  u = Frama_C_interval(20  80); 
  if (u & 1) 
    u = 3 * u + 1; // odd 
  else 
    u = u / 2; // even 
}

The command frama-c -val c.c share/builtin.c shows the final value of u to be in [10..241]. If the first line of main() is changed to make the input value range over [40..60] instead of [20..80] then the output interval for u becomes [20..181]. This seems normal : you know more about variable u going in (all the values in [40..60] are included in [20..80]) so you know more about u coming out ([20..181] is included in [10..241]). Most of the analyzer's weaknesses such as in this case the inability to recognize that the largest value that can get multiplied by 3 is respectively 79 or 59 do not compromise this general principle. But widening does.

Better information in worse information out

Widening happens when there is a loop in the program such as in the example below:

int u  i; 
main(){   
  u = Frama_C_interval(40  60); 
  for (i=0; i<10; i++) 
    u = (25 + 3 * u) / 4; 
}

$ frama-c -val loop.c share/builtin.c 
... 
[value] Values for function main: 
          u ∈ [15..60]

Analyzing this program with the default options produce the interval [15..60] for u. Now if we change the initial set for u to be [25..60] an interval that contains all the values from [40..60] and more:

$ frama-c -val loop.c share/builtin.c 
... 
[value] Values for function main: 
          u ∈ [25..60]

By using a larger less precise set as the input of this analysis we obtain a more precise result. This is counter-intuitive although it is rarely noticed in practice.

Better modelization worse analysis

There are several ways to make this unpleasant situation less likely to be noticed. Only a few of these are implemented in Frama-C's value analysis. Still it was difficult enough to find a short example to illustrate my next point and the program is only as short as I was able to make it. Consider the following program:

int i  j; 
double ftmp1; 
double Frama_C_cos(double); 
double Frama_C_cos_precise(double); 
#define TH 0.196349540849361875 
#define NBC1 14 
#define S 0.35355339 
double cos(double x) 
{ 
  return Frama_C_cos(x); 
} 
main(){   
  for (i = 0; i < 8; i++) 
    for (j = 0; j < 8; j++) 
      { 
        ftmp1 = ((j == 0) ? S : 0.5) * 
          cos ((2.0 * i + 1.0) * j * TH); 
	ftmp1 *= (1 << NBC1); 
      } 
}

If you analyze the above program with a smartish modelization of function cos() one that recognizes when its input is an interval [l..u] of floating-point values on which cos() is monotonous and computes the optimal output in this case you get the following results:

$ frama-c -val idct.c share/builtin.c 
... 
          ftmp1 ∈ [-3.40282346639e+38 .. 8192.]

Replacing the cos() modelization by a more naive one that returns [-1. .. 1.] as soon as its input set isn't a singleton you get the improved result:

$ frama-c -val idct.c share/builtin.c 
... 
          ftmp1 ∈ [-8192. .. 8192.]

This is slightly different from the [25..60] example in that here we are not changing the input set but the modelization of one of the functions involved in the program. Still the causes are the same and the effects just as puzzling when they are noticeable. And this is why until version Carbon only the naive modelization of cos() was provided.

In the next release there will be in addition to the naive versions Frama_C_cos_precise() and Frama_C_sin_precise() built-ins for when you want them. Typically this will be for the more precise analyses that do not involve widening.