# Frama-C news and ideas

## Short, difficult programs

When researchers start claiming that they have a sound and complete analyzer for predicting whether a program statement is reachable, it is time to build a database of interesting programs.

## Goldbach's conjecture

My long-time favorite is a C program that verifies Goldbach's conjecture (actual program left as an exercise to the reader).

If the conjecture is true, the program never terminates (ergo a statement placed after it is unreachable). If the conjecture is false, the program terminates and moves on to the following statement. The program can be implemented using `unsigned long long` integers that should be good for counter-examples up to 18446744073709551615. It can alternately use dynamic allocation and multi-precision integers, in which case, depending whether your precise definition of “analyzing a C program” includes out-of-memory behaviors, you could claim that the reachability of the statement after the counter-example-finding loop is equivalent to the resolution of “one of the oldest and best-known unsolved problems in number theory and in all of mathematics”.

## Easier programs than that

No-one expects the resolution of Goldbach's conjecture to come from a program analyzer. This example is too good, because it is out of reach for everyone—it has eluded our best mathematicians for centuries. What I am looking for in this post is easier programs, where the solution is just in reach. Examples that it would genuinely be informative to run these sound, complete analyzers on. If they were made available.

### With integers

For 32-bit ints and 64-bit long longs, I know that the label `L` is unreachable in the program below, but does your average sound and complete program analyzer do?

```/*@ requires 2 <= x ;
requires 2 <= y ; */
void f(unsigned int x, unsigned int y)
{
if (x * (unsigned long long)y == 17)
L: return;
}
```

The key is that with the aforementioned platform hypotheses, the multiplication does not overflow. Label `L` being reached would mean that the program has identified divisors of the prime number `17`, which we don't expect it to.

In the program below, the multiplication can overflow, and not having tried it, I have truly no idea whether the label `L` is reachable. I expect it is, statistically, but if you told me that it is unreachable because of a deep mathematical property of computations modulo a power of two, I would not be shocked.

```/*@ requires 2 <= x ;
requires 2 <= y ; */
void f(unsigned long long x, unsigned long long y)
{
if (x * y == 17)
L: return;
}
```

### With floating-point

Floating-point is fun. Label `L` in the following program is unreachable:

```/*@ requires 10000000. <= x <= 200000000. ;
requires 10000000. <= y <= 200000000. ; */
void sub(float x, float y)
{
if (x - y == 0.09375f)
L: return;
}
```

Frama-C's value analysis, and most other analyzers, will not tell you that label `L` is unreachable. It definitely looks reachable, The difference between `x` and `y` can be zero, and it can be `1.0`. It looks like it could be `0.09375` but it cannot: the subtracted numbers are too large for the difference, if non-zero, to be smaller than `1.0`.

So the subtlety in the example above is the magnitude of the arguments. What about smaller arguments then?

```/*@ requires 1.0 <= x <= 200000000. ;
requires 1.0 <= y <= 200000000. ; */
void sub(float x, float y)
{
if (x - y == 0.09375f)
L: return;
}
```

This time the label `L` is easily reachable, for instance with the inputs `2.09375` for `x` and `2.0` for `y`.

```/*@ requires 1.0 <= x <= 200000000. ;
requires 1.0 <= y <= 200000000. ; */
void sub(float x, float y)
{
if (x - y == 0.1f)
L: return;
}
```

The target difference `0.1f` is larger than `0.09375f`, so it should be easier to obtain as the difference of two floats from the prescribed range, right? In fact, it isn't. The numbers `0.099999904632568359375` and `0.10000002384185791015625` can be obtained as values of `x-y` in the above program, for the former as the result of picking `1.099999904632568359375` for `x` and `1.0` for `y`. The value `0.1f`, on the other hand, cannot be obtained as the subtraction of two floats above `1.0`, because some bits are set in its significand that cannot be set by subtracting floats that large.

## Conclusion

I expect it will be some time before automatic analyzers can soundly and completely decide, in all two-line programs, whether the second line is reachable. Frama-C's value analysis does not detect unreachability in any of the programs above, for instance (the value analysis is sound, so it soundly detects that `L` may be reachable when it is). Please leave your own difficult two-line programs in the comments.

The floating-point examples in this post owe quite a bit to my colleague Bruno Marre explaining his article “Improving the Floating Point Addition and Subtraction Constraints” to me over lunch.

1. On Monday, November 5 2012, 11:13 by bb

PathCrawler is a test generation tool developped at CEA and
now integrated in the Frama-C platform.

I tried the examples provided by pascal in PathCrawler and here are the results :
(since PathCrawler does not yet deal with ACSL specifications, I use the pathcrawler_assume
construction to provide constraints on inputs)
---------------------------------------------------------------------
/*@ requires 2 <= x ;
requires 2 <= y ; */
void f1(unsigned int x, unsigned int y)
{ pathcrawler_assume(2 <= x);
pathcrawler_assume(2 <= y);

if (x * (unsigned long long)y == 17)
L: return;
}

PathCrawler concludes that L is unreachable (the only path containing L is infeasible)

---------------------------------------------------------------------
/*@ requires 2 <= x ;
requires 2 <= y ; */
void f2(unsigned long long x, unsigned long long y)
{
pathcrawler_assume(2 <= x);
pathcrawler_assume(2 <= y);
if (x * y == 17) {printf("coucou\n");
L: return;}
}

PathCrawler provides a test case that reaches L :

x = 12494102768015372291;
y = 15099258249014797659;

---------------------------------------------------------------------
PathCrawler is based on the COLIBRI solver
which is also able to reason on floating point numbers.
We are currently adding the treatment of floats in the solver,
but today it only takes into account double arithmetic.
As a workaround, PathCralwer treats floats as doubles
and thus may provide wrong results but
when a solution is found by the solver,
PathCrawler confirms that solution with a concrete execution
allowing to reject these bad test cases.

------------------------------
void sub(float x, float y)
{
pathcrawler_assume(10000000. <= x);
pathcrawler_assume(10000000. <= y);
pathcrawler_assume(x <= 200000000.);
pathcrawler_assume(y <= 200000000.);
if (x - y == 0.09375f)
L: return;
}

On this example, the solver finds a double solution to reach
L, but this solution is not confirmed by execution and is rejected.
---------------------------------------------------------------------
/*@ requires 1.0 <= x <= 200000000. ;
requires 1.0 <= y <= 200000000. ; */
void sub1(float x, float y)
{
pathcrawler_assume(1.0 <= x);
pathcrawler_assume(1.0 <= y);
pathcrawler_assume(x <= 200000000.);
pathcrawler_assume(y <= 200000000.);
if (x - y == 0.09375f)
L: return;
}

On this example, PathCrawler provides a test case that reaches L
x= 451;
y= 450.90625;
---------------------------------------------------------------------
/*@ requires 1.0 <= x <= 200000000. ;
requires 1.0 <= y <= 200000000. ; */
void sub2(float x, float y)
{
pathcrawler_assume(1.0 <= x);
pathcrawler_assume(1.0 <= y);
pathcrawler_assume(x <= 200000000.);
pathcrawler_assume(y <= 200000000.);
if (x - y == 0.1f)
L: return;
}

On this example PathCrawler concludes that in double,
L would be unreachable.

---------------------------------------------------------------------
Following these results, Pascal asked me to try some variants :

A bigger prime number

/*@ requires 2 <= x ;
requires 2 <= y ; */
void f11(unsigned int x, unsigned int y)
{ pathcrawler_assume(2 <= x);
pathcrawler_assume(2 <= y);

if (x * (unsigned long long)y == 15485863)
L: return;
}

PathCrawler concludes L unreachable
---------------------------------------------------------------------
A reachable big value (product of two primes)

/*@ requires 2 <= x ;
requires 2 <= y ; */
void f12(unsigned int x, unsigned int y)
{ pathcrawler_assume(2 <= x);
pathcrawler_assume(2 <= y);

if (x * (unsigned long long)y == (44026308509))
L: return;
}

PathCrawler finds to reach L :
x = 2843;
y = 15485863;

2. On Monday, November 5 2012, 16:00 by Pascal

Vexed, I said: “how about (x * (unsigned long long)y == 90000000000077ULL) and (x * (unsigned long long)y == 2000000025000000077ULL) then?”

3. On Monday, November 5 2012, 17:18 by bb

Well, PathCrawler concludes that the first one
(x * (unsigned long long)y == 90000000000077ULL)
is unreachable quite easily.

Concerning the second one, I started the tool, but I have a bus to take,
may I answer only tomorrow ? or next year ? ...

4. On Monday, November 5 2012, 17:37 by Pascal

Indeed, determining whether x * y == C is feasible is easier than I thought for small constants C, for instance using constraint propagation techniques. I had not thought about it, but even the Frama-C plug-in that Sven Mattsen was developing during the summer of 2012 would automatically and rapidly find that x * y == 17 is unfeasible.

Actually factoring C is difficult for well-chosen composite numbers C of thousands of bits: a large part of public-key cryptography relies on this property. Factoring 64-bit numbers should be at least mildly challenging for a general-purpose code analyzer. An analyzer could use a probabilistic primality test to claim that the condition is unfeasible, but then such an analyzer would only be probabilistically sound.

5. On Tuesday, November 6 2012, 09:37 by bb

Finally, a few hours later, PathCrawler provides me a solution for
(x * (unsigned long long)y == 2000000025000000077ULL) :
x = 2000000011;
y = 1000000007;

The COLIBRI solver of course is not specialized to the factoring problem.
It is able to deal with non linear constraints but
in such a case where there is only one multiplicative constraint, it spends its time in the labelling phase
(it is a Constraint Logic Programming based solver), with no specific
labelling strategy. The labeling of one of the input variables is sufficient but
the search space is huge.