Frama-C news and ideas

To content | To menu | To search | Frama-C Home

Tag - icpc2011

Entries feed - Comments feed

Sunday, June 12 2011

Final thoughts on the ICPC 2011 industrial challenge

I have received the review of my submission to the ICPC 2011 industrial challenge. If you have read the challenge description, you may remember that participating involved writing various fictional e-mails. In my submission, I skimped a bit on that part. Some of the e-mails I replaced by an e-mail to my fictional boss, explaining to him that my holidays had started and that he was in the best position to write them anyway. Which he would be. In fact, being my boss, he would know me well and prefer me never ever to write directly to important customers' CEOs.


That these e-mails were required tells something about the ICPC conference (that I have never attended and that I would not have thought of attending without this contest). "Comprehension" is the key word — with respect to what we are already concerned with in this community, anyway. It's not just about the programs, it's also about the humans that have to deal with the programs. In some ways, we completely ignore these aspects: we would like the person verifying the software to trust that it works in the same way that you may trust a counter-intuitive mathematical result with a proof, each step of which you can check is correct. I am naturally exaggerating our differences here: before the formal proof, someone has to have the intuition. We are not working in opposite directions, we are simply working each on difficult subjects that, until now, have been occupying all our respective attentions.


Even more telling is that in places were I did not think I was skimping, the organizers expected more human/economical/social/psychological thinking than I provided in good faith. Where the contest required an e-mail to:

RobotControllers.com Quality Assurance: Explain to this non-engineer how the code caused the cus- tomers’ problems, and what effect your bug fixes had on the program’s output. Instruct this person how to recognize symptoms of the bug that may be communicated by customers.

I simply wrote (people who have interacted with me on Frama-C's own issue tracker will recognize my patented style):

Issue 1:

Description: When the moveby command is used, the robot leg moves very slowly.

Workaround: Do not use the moveby command. Status: Fixed in last revision.

Issue 2:

Description: The robot leg does not stop jiggling when it hits the target angle. Workaround: No workaround known. Status: Fixed in last revision.

Issue 3:

Description: When issuing an order to move in a direction opposite to the one the leg is currently moving, destruction of the robot. Workaround: Wait for the completion of any command before issuing any new command that would cause a change in direction. Status: Fixed in last revision.

Issue 4:

Description: At the end of medium-range move orders, the leg stops moving abruptly. The robot smells slightly funny. Workaround: Issue only short-range orders where the leg does not build up speed, or long-range orders where the leg reaches its full speed. Status: Fixed in last revision.

The reviewer had the valid remarks below (I hope ey won't mind that I quote em here. We're bridging the gap between two communities! If we have to breach etiquette and quote excerpts of reviews that are supposed to remain confidential, so be it…)

the QA person needs advice on how to recognize when a new caller is reporting a similar bug; the explanations of the details were too vague to direct callers properly. QA also needs to know what to say to the customer when a bug is recognized: In addition to technical workarounds, is it possible to return the chip? Is there an alternative available? Is there a testing method suggested for discovering similar flaws at the customer site in the future? Should the customer be compensated for exploded controllers?


I will be attending ICPC 2011. I hoped Anne Pacalet would be there too (as I already pointed out, she implemented the code navigation features). She won't be able to come, but she wants me to take notes during Margaret Burnett's invited talk.

Thursday, June 9 2011

Fixing robots, part 2

This post follows that post.

For cases 2 and 3, in order to get more complete execution traces, we again take advantage of Frama_C_dump_each(). Before launching the analysis, we insert a call to this primitive right at the end of RoCo_process(), so that local variables of that function will be displayed too.

--- roco.c.orig	2011-05-16 13:53:57.000000000 +0200
+++ roco.c	2011-05-16 13:55:18.000000000 +0200
@@ -303,4 +303,5 @@
     else {
     	RoCo_engineVoltage = 0.0;
     }
+    Frama_C_dump_each();
 }

The static analysis done by the previously provided shell script (key command below) again confirms that the execution does not run into any of the C99 undefined behaviors that the value analysis detects (uninitialized access, use of a dangling pointer, overflows in signed integer arithmetics, invalid memory access, invalid comparison of pointers, division by zero, undefined logical shift, overflows in conversions from floating-point to integer, infinite or NaN resulting from a floating-point operation, undefined side-effects in expressions). Yes, I will now give the full list every time. One year ago I stumbled on a comparative study between PolySpace and Frama-C's value analysis that gave the full list of alarms for PolySpace but listed only two types of undefined behaviors for the value analysis. The author had lazily copy-pasted a sentence from the documentation's introduction, but had crucially omitted the ellipsis. As a consequence, I will now inflict the full list on everyone every time. Bwa-ha-ha. Also, I plan to add three more categories of detected undefined behaviors in the near future. Bwa-ha-ha-ha-ha.

frama-c ${FILES} ${TESTCASE} -val -slevel 50000 -unspecified-access -val-signed-overflow-alarms -calldeps -save state$1

Case 2

The trace corroborates the "does not stop jiggling when it hits the target angle" diagnostic. We concentrate on relevant lines of the log using the command:

grep "\\(STATE\\|RoCo_legAngle \\|RoCo_engineVoltage\\|lastTime\\)" log2

Here is the bit when the target for the first command is reached:

[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -29.9791361754
        RoCo_engineVoltage ∈ -0.700730737571
        lastTime ∈ {66520}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.0013178639
        RoCo_engineVoltage ∈ -0.573709804476
        lastTime ∈ {66540}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.0235995355
        RoCo_engineVoltage ∈ -0.469713860267
        lastTime ∈ {66560}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.0454486754
        RoCo_engineVoltage ∈ -0.384569182548
        lastTime ∈ {66580}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.0664935071
        RoCo_engineVoltage ∈ -0.314858616438
        lastTime ∈ {66600}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.0864858403
        RoCo_engineVoltage ∈ -0.257784432149
        lastTime ∈ {66620}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1052716542
        RoCo_engineVoltage ∈ 0.07897475281
        lastTime ∈ {66640}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1227678881
        RoCo_engineVoltage ∈ 0.354689853917
        lastTime ∈ {66660}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1389442061
        RoCo_engineVoltage ∈ 0.580426286282
        lastTime ∈ {66680}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1538087376
        RoCo_engineVoltage ∈ 0.765243645549
        lastTime ∈ {66700}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1673969897
        RoCo_engineVoltage ∈ 0.916559301284
        lastTime ∈ {66720}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1791116186
        RoCo_engineVoltage ∈ 1.04044608206
        lastTime ∈ {66740}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1885670036
        RoCo_engineVoltage ∈ 1.14187599937
        lastTime ∈ {66760}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.195537956
        RoCo_engineVoltage ∈ 1.22491979196
        lastTime ∈ {66780}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.1999193366
        RoCo_engineVoltage ∈ 1.29291029881
        lastTime ∈ {66800}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.2016944051
        RoCo_engineVoltage ∈ 1.34857621768
        lastTime ∈ {66820}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -30.2009101399
        RoCo_engineVoltage ∈ 1.39415161736
        lastTime ∈ {66840}
...
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -29.9951370606
        RoCo_engineVoltage ∈ 0.46939308602
        lastTime ∈ {67120}
...
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ -29.8107877453
        RoCo_engineVoltage ∈ -1.35491972386
        lastTime ∈ {67400}
...

The voltage is still negative when the target angle of -30 is reached. The voltage becomes positive around the angle of -30.1, but the leg is still moving down at that time. We attribute this phenomenon to inertia. The angle reaches about -30.2 before it finally stops decreasing and starts increasing again.

Using the same method as in case 1, it is possible to identify some variables that influence the choice of value for Roco_Enginevoltage. This allows us to determine a number of independent changes that each fix the problem.

The first possible change is to make the robot leg move back more slowly when it has overshot its destination. This fixes the issue, according to the Python script provided to be used as an oracle.

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100
+++ roco_config_testcase_2.h	2011-05-19 20:15:56.000000000 +0200
@@ -35,7 +35,7 @@
 real64 RoCo_maxAngle_PARAM = 90.0;
 real64 RoCo_minAngle_PARAM = -90.0;
 real64 RoCo_shutdownTimeout_PARAM = 12.0;
-real64 RoCo_stepSpeed_PARAM = 10.0;
+real64 RoCo_stepSpeed_PARAM = 5.0;
 real64 RoCo_TempFltT_PARAM = 0.1;
 real64 RoCo_TimeSlopeNeg_PARAM = 1.0;
 real64 RoCo_TimeSlopePos_PARAM = 1.0;

Another possible fix is to make the robot accept a larger error margin when deciding whether it is far enough of the intended angle to justify moving back.

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100
+++ roco_config_testcase_2.h	2011-05-19 20:18:12.000000000 +0200
@@ -11,7 +11,7 @@
 real64 RoCo_angleAtMark_PARAM = 25.0;
 real64 RoCo_angleReachedThreshold1_PARAM = 2.0;
 real64 RoCo_angleReachedThreshold1Fast_PARAM = 4.0;
-real64 RoCo_angleReachedThreshold2_PARAM = 0.1;
+real64 RoCo_angleReachedThreshold2_PARAM = 0.3;
 real64 RoCo_batteryLowLimit_PARAM = 15.0;
 real64 RoCo_batteryLowDelay_PARAM = 60.0;
 boolean RoCo_checkBatteryVoltage_PARAM = TRUE;

Finally, my two personal favorite fixes are below. They are (most probably) completely wrong, but they do fix the problem according to the Python script oracle. The first makes the voltage filter so strong that the oscillations become less visible to the script:

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100
+++ roco_config_testcase_2.h	2011-05-19 20:19:10.000000000 +0200
@@ -39,7 +39,7 @@
 real64 RoCo_TempFltT_PARAM = 0.1;
 real64 RoCo_TimeSlopeNeg_PARAM = 1.0;
 real64 RoCo_TimeSlopePos_PARAM = 1.0;
-real64 RoCo_voltageFilter_PARAM = 0.1;
+real64 RoCo_voltageFilter_PARAM = 0.7;
 
 real64 Engine_maxVoltage_PARAM = 18.0;
 real64 Engine_minVoltage_PARAM = -18.0;

My other favorite fix makes the voltage filter so weak that the motor no longer overshoots systematically when trying to correct its position.

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100
+++ roco_config_testcase_2.h	2011-05-19 20:20:35.000000000 +0200
@@ -39,7 +39,7 @@
 real64 RoCo_TempFltT_PARAM = 0.1;
 real64 RoCo_TimeSlopeNeg_PARAM = 1.0;
 real64 RoCo_TimeSlopePos_PARAM = 1.0;
-real64 RoCo_voltageFilter_PARAM = 0.1;
+real64 RoCo_voltageFilter_PARAM = 0.02;
 
 real64 Engine_maxVoltage_PARAM = 18.0;
 real64 Engine_minVoltage_PARAM = -18.0;

"Favorite" does not mean that I would recommend either of these two incompatible fixes. The first two seem more reasonable and less likely to break something else.

Case 3

According to the challenge description, we should use our "tacit knowledge gained from several years of employment at RobotControllers.com". Since we have not really been employed at RobotControllers.com for several years, it is only fair to use the tacit knowledge encoded in the provided acceptance Python script. This is probably what the organizers meant. Looking at the script, we see that the robot leg destruction is caused by a sudden variation in the voltage applied to the motor. In practice, the motor, when carried away by the inertia of the mechanism it is supposed to set in motion, becomes a generator. The difference between the voltage this generator provides and the voltage set by the environment should never be too large. A well-designed modern power supply can take the hit, but a large voltage difference also means large intensities going through the motors' coils. The motor turns into a generator, but it doesn't turn into a well-protected one.

The relevant part of the log is extracted as follows:

grep "\\(STATE\\|RoCo_legAngle \\|RoCo_engineVoltage\\|lastTime\\)" log3
...
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ -6.93042056043
        lastTime ∈ {109960}
        rampValue ∈ 0.998
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ -6.94303317304
        lastTime ∈ {109980}
        rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ -4.41559004995
        lastTime ∈ {110000}
        rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ -2.34629463843
        lastTime ∈ {110020}
        rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ -0.652098847809
        lastTime ∈ {110040}
        rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ 0.734991347706
        lastTime ∈ {110060}
        rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
        RoCo_engineVoltage ∈ 1.87064474807
        lastTime ∈ {110080}

I have included variable rampValue above because, as one can see by exploring the code in the GUI as for case 1, in addition to the voltage filter, the program has a mechanism to make the voltage increase slowly when transitioning from idle to active or from active to idle. Slice for the statement that computes the value of rampValue to see this mechanism more clearly for yourself: the command frama-c-gui -load state3 can be used to load the values previously computed. In the statement's contextual menu, first select "Enable slicing", and then "Slicing → Slice stmt".

slice_rampValue.png

The log extract above shows that unfortunately, this mechanism does not smooth the voltage when a new order arrives in the middle of the execution of the current order. Here, the motor has just ramped out to full speed (rampValue==1.0) when the command to move in the opposite direction arrives at lastTime==110000. The value of rampValue remains pegged at 1.0 during the transition. This could be considered to be the bug that causes the voltage jump.

Apart from the sleazy way we obtain the numerical limit by reverse-engineering the python script, it is impossible to argue against the inclusion of the patch below. Although it is written without understanding the root cause of the problem, it exactly prevents a situation that is never desirable, the destruction of the robot.

--- challenge/src/roco.c.orig	2011-05-16 13:53:57.000000000 +0200
+++ ppc/tests/ICPC/src/roco_sol3.c	2011-05-02 16:35:52.000000000 +0200
@@ -293,8 +293,17 @@
                 desiredEngineVoltage, Engine_maxVoltage_PARAM);
         limitationActive = (Engine_maxVoltage_PARAM == desiredEngineVoltage) ||
                              (Engine_minVoltage_PARAM == desiredEngineVoltage);
-        RoCo_engineVoltage = PT1_Filter (&voltageFilter, desiredEngineVoltage,
-                t13, dT);
+
+	{ real64 tentativeV = PT1_Filter (&voltageFilter, desiredEngineVoltage,
+					  t13, dT);
+	  if (tentativeV - RoCo_engineVoltage > 0.390625)
+	    RoCo_engineVoltage += 0.390625;
+	  else if (tentativeV - RoCo_engineVoltage < -0.390625)
+	    RoCo_engineVoltage -= 0.390625;
+	  else 
+	    RoCo_engineVoltage = tentativeV;
+	}
+
         wasInit = init;
         wasActive = RoCo_isActive;
         Timer_tick (&shutdownTimer, dT);

Even if/when the root cause of the problem is fixed, this patch should remain in place for the sake of defensive programming. It exactly prevents the voltage jumps that cause the destruction of the motor, and has almost no practical effects otherwise, so it does not hurt.

The best fix would perhaps be to do a ramp down followed by a ramp up when receiving, in the middle of the execution of an order, another order that causes a change in direction. This seems like a major change to the logic of the program, so we won't attempt it and we will instead rely on the patch above to prevent destruction of the robot leg.

After applying the patch above, we run and analyze testcase 3 again and we obtain another complaint from the validation script eval.py provided by the organizers:

src $ make && ./roco_3.exe 2> trace3 && python ../eval.py 3 < trace3
make: Nothing to be done for `all'.
149100: TOO SLOW / TARGET NOT REACHED!
found 1 error(s)

To look on the bright side of life, the robot didn't explode.

The analysis log again conveniently provides all variables' values in a single (if longish) step:

$ grep "\\(RoCo_engineVoltage\\|lastTime\\|STATE\\|rampValue\\|legAngle \\)" log3 
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 19.970696807
        RoCo_engineVoltage ∈ 6.00755510873
        lastTime ∈ {115740}
        rampValue ∈ 0.874
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 20.0821939081
        RoCo_engineVoltage ∈ 5.99058545861
        lastTime ∈ {115760}
        rampValue ∈ 0.872
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 20.1933858392
        RoCo_engineVoltage ∈ 5.9736158
        lastTime ∈ {115780}
        rampValue ∈ 0.87
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 20.304272565
        RoCo_engineVoltage ∈ 5.95664613443
        lastTime ∈ {115800}
        rampValue ∈ 0.868
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 20.4148540546
        RoCo_engineVoltage ∈ 5.93967646317
        lastTime ∈ {115820}
        rampValue ∈ 0.866
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 20.5251302809
        RoCo_engineVoltage ∈ 5.92270678725
        lastTime ∈ {115840}
        rampValue ∈ 0.864
...

As illustrated in the log snippet above, the robot leg overshoots the target position. This can be attributed to the value of variable rampValue being too high, and specifically, decreasing too slowly. Since there is now a piece of code to limit voltage jumps, we can increase the speed at which this variable varies without fear of destroying the robot. In fact, for all we know, this variable was set to vary slowly in a misguided attempt to fix the explosion problem.

Since parameters RoCo_TimeSlopePos_PARAM and RoCo_TimeSlopeNeg_PARAM influence directly the computation of rampValue, we decide to adjust these:

--- roco_config_testcase_3.h	2011-03-23 11:07:48.000000000 +0100
+++ roco_config_testcase_3_sol3.h	2011-05-02 16:35:52.000000000 +0200
@@ -37,8 +37,8 @@
 real64 RoCo_shutdownTimeout_PARAM = 12.0;
 real64 RoCo_stepSpeed_PARAM = 8.0;
 real64 RoCo_TempFltT_PARAM = 0.15;
-real64 RoCo_TimeSlopeNeg_PARAM = 0.1;
-real64 RoCo_TimeSlopePos_PARAM = 0.1;
+real64 RoCo_TimeSlopeNeg_PARAM = 0.5;
+real64 RoCo_TimeSlopePos_PARAM = 0.5;
 real64 RoCo_voltageFilter_PARAM = 0.1;
 
 real64 Engine_maxVoltage_PARAM = 15.0;
$ make && ./roco_3.exe 2> trace3 && python ../eval.py 3 < trace3 
...
found 0 error(s)


The ReadMe.txt reports that the field engineers also complained about voltage variations at time t=50. Since the voltage is perfectly stable around t=50s, we can interpret this part of their report as caused by the hallucinogenic fumes famously released by burning electronics. Perhaps they mean t=66140ms, when indeed the voltage can be seen to drop a bit sharply in the original execution trace. The reason why it drops sharply is clear on this selection of variable values:

[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 12.3506194964
        RoCo_engineVoltage ∈ 4.50997019947
        lastTime ∈ {66120}
        rampValue ∈ 0.614
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 12.4293066254
        RoCo_engineVoltage ∈ 4.52254504977
        lastTime ∈ {66140}
        rampValue ∈ 0.616
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 12.5082201017
        RoCo_engineVoltage ∈ 3.90576827098
        lastTime ∈ {66160}
        rampValue ∈ {0}
[value] DUMPING STATE of file roco.c line 306
        RoCo_legAngle ∈ 12.5873599253
        RoCo_engineVoltage ∈ 3.4007941544
        lastTime ∈ {66180}
        rampValue ∈ {0}

Variable rampValue drops from 0.616 to 0.0, and this cannot even be excused by the arrival of an order at that instant. It seems the ramping up/down part of the program might as well be thrown away and re-written from scratch.

Looking at values for instants 66140 and 66160 side-by-side, it becomes clear what changes and causes rampValue to jump to 0. 66140.png

The variable rampTarget changes from 1.0 to 0.0.

We could execute cycle 66160 in a debugger, but that would be banal. Instead, let us use a trick to observe that cycle on its own in Frama-C's GUI. We replace the line RoCo_process(); in main_testcase_3.c with:

                if (lastTime==66160) RoCo_process_66160(); else RoCo_process();

We then copy-paste the body of function RoCo_process() as the definition of an identical function RoCo_process_66160() and launch again the analysis. Yes, this means waiting ten minutes again. You have to make sacrifices to be cool. This maneuver allows us to observe cycle 66160 in the GUI, unmixed with the others, with a lot more code displayed in orange (indicating that this code was not executed at that cycle and can be ignored), and values that are specific for that cycle.

gui_66160.png

Listing the differences between this approach and using a debugger is left as an exercise for the reader. This inspection mostly confirms that the value 0.0 seems normal for variable rampTarget, and that it should be function Ramp_out() that makes sure that rampValue varies slowly even when rampTarget jumps around. Note in passing that the duplication of function RoCo_process() does not help to see what happens inside function Ramp_out() at cycle 66160. But you are probably growing a little restless, so I won't suggest to launch an analysis again. We can perfectly well infer what happens inside function Ramp_out() by reading the code. Doing that, we notice the tests below.

  if (data->dir == -1) {
    data->state = target;
  }
   ...
  if (data->dir == 1) {
    data->state = target;
  }

This doesn't make sense. There is never any reason to jump directly to the target. We suppress these lines, and the voltage jump around 66160 disappears.

Monday, June 6 2011

Fixing robots, part 1

This blog post is a revised version of part of my submission to the ICPC 2011 Industry Challenge. Please, go ahead and read the challenge description. I could only paraphrase it without adding anything to it, and so I won't.

The study was made with the April development version of Frama-C, which differs from the last released version in a number of bugfixes (list of reported issues) and minor new features. Specifically, it uses the value analysis and dependencies computations (documented in this manual) and code navigation features. The latter reuse building blocks that were implemented for slicing.

The ever important first step: identify missing library functions

The first step, as the faithful reader of this blog would now know, before using the value analysis for anything is to identify library functions the program depends on:

frama-c impls.c main.c roco.c sim.c main_testcase_1.c -metrics -cpp-command "gcc -DTESTCASE=1 -C -E"

...
Undefined functions (10):
 __builtin_inf  (15 calls); __builtin_inff  (15 calls); fprintf  (2 calls);
 __swbuf  (10 calls); __builtin_fabsf  (15 calls); exp  (1 call);
 __builtin_infl  (15 calls); __builtin_fabsl  (15 calls);
 __builtin_fabs  (15 calls); fabs  (9 calls); 
...

The host computer's standard headers have been used, hence the confusing __builtin_ prefixes. Better not rely on them: the robot's operating system probably bears little resemblance to the host's. Placeholder headers provided with Frama-C can be used instead:

frama-c impls.c main.c roco.c sim.c main_testcase_1.c -metrics -cpp-command "gcc -DTESTCASE=1 -C -E -nostdinc -I. -I/usr/local/share/frama-c/libc"

...
Undefined functions (3):
 exp  (1 call); fabs  (9 calls); fprintf  (2 calls); 
...

Standard functions exp() and fabs() are missing from the Carbon release, but have been added to the development version. With the development version, one simply needs to list /usr/local/share/frama-c/libc/math.c as a file of the analysis project. Function exp() is implemented as a value analysis builtin, taking advantage of the host's own exp() function.

If you wish to reproduce at home, you have to provide your own implementation for exp(). It may not need to be very accurate, but it needs to be analyzable with Carbon's value analysis. A good way would be to clean up the implementation from Sun that everyone is using. As I already said in previous posts, accessing the bits of a floating-point representation directly is so 1990s. If someone removed these ugly and obsolete manipulations, we would end up with a better implementation for exp() (and it would be analyzable with the value analysis from Carbon).

For fabs(), you can use the function below. An implementation from a library would probably unset the sign bit in the IEEE 754 representation but, again, Carbon's value analysis doesn't handle this precisely yet. The implementation below is handled precisely in the sense that a singleton for the input x results in a singleton for the returned value.

double fabs(double x){
  if (x==0.0) return 0.0;
  if (x>0.0) return x;
  return -x;
}

Finally, calls to fprintf(), that do not have any repercussions on the continuing execution, can be replaced with calls to the Frama_C_show_each() built-in. We can let the pre-processor to do this for us, passing the option -Dfprintf=Frama_C_show_each to GCC when pre-processing.

Looking for unspecified/undefined behaviors

We are now ready to launch the value analysis. Since we anticipate testcases 2 and 3 to be very similar, we write a short script:

#!/bin/bash

export CPP="gcc -Dfprintf=Frama_C_show_each -DTESTCASE=$1 -C -E -nostdinc -I. -I/usr/local/share/frama-c/libc"
FILES="/usr/local/share/frama-c/libc/math.c impls.c main.c roco.c sim.c"
TESTCASE="main_testcase_$1.c"

exec frama-c ${FILES} ${TESTCASE} -val -slevel 50000 -unspecified-access -val-signed-overflow-alarms -calldeps -save state$1

About ten minutes after running script.sh 1 > log1, we obtain a log (long:~70MiB) and a state (~1MiB). The log is long because it contains progression messages that are only intended to help identify analysis issues, and can be ignored most of the time. The state contains all the information that has been computed about the program, including values of variables, in a very compact format.

The analysis is completely unrolled (because of option -slevel 50000) and precise until the end. This means that the value analysis has in effect simulated the execution of the program with the inputs provided in main_testcase_1.c.

The log, despite its size, does not warn about any of the undefined or unspecified behaviors that the value analysis is guaranteed to identify (uninitialized access, use of a dangling pointer, overflows in signed integer arithmetics, invalid memory access, invalid comparison of pointers, division by zero, undefined logical shift, overflows in conversions from floating-point to integer, infinite or NaN resulting from a floating-point operation, undefined side-effects in expressions). This is very important. It means that we can rest assured that the strange dynamic behavior we are going to investigate is not caused by the misuse of one of C's dangerous constructs. Nothing would be more frustating than having to track the value of a variable which, according to the source code, is not supposed to change, but is modified through a buffer overflow. The value analysis guarantees we won't have to do that for this execution.

What would of course be better would be to verify that there can be none of the above undefined behaviors for any command sequence. This would be a much stronger result, but would also require a lot more work. When simulating a single execution inside the analyzer, we only check that that particular execution is free of undefined behavior, but it does not require any work from us (only a few minutes of work from the computer).

Exploiting the value analysis' results for program comprehension

The values computed and stored in state1 can be observed in Frama-C's GUI, using the command-line frama-c-gui -load state1. The GUI can also be used to identify the definition site(s) of a variable's value: select the variable at the program point you are interested in, and in the contextual menu, invoke Dependencies → Show defs.

Here, it is the value of variable RoCo_engineVoltage as displayed by the call to fprintf() (that we transformed into a call to Frama_C_show_each()) that is wrong, so we request the definition site(s) of that value:

s1.png

The GUI has pointed us to a call to function RoCo_Process() (using the yellow mark), so we now request the definition sites of RoCo_engineVoltage by the return; statement of that function. We obtain the two sites identified below:

s2.png

The condition that decides which branch is executing is the one shown in the screenshot below.

s3.png

The value analysis tell us the value of RoCo_isActive can be either 0 or 1 at this point during execution, but this variable is one of the variables whose value is printed in the analysis log, and its value was 1 at the instant we are interested in. We therefore focus on the definition site where the value assigned to RoCo_engineVoltage is computed in a call to PT1_Filter().

The dependencies of the particular call to PT1_Filter() we are interested in were computed by option -calldeps and can be found in the log. The call we are interested in is at statement 433. The log contains:

call PT1_Filter at statement 433:
 voltageFilter FROM state; x; t1; dt; voltageFilter
 \result FROM state; x; t1; dt; voltageFilter

Apart from reading its arguments state, x, t1, and dt, the call accesses a static variable voltageFilter. The address of voltageFilter is taken, so we have to be careful: this variable could be modified erroneously through a pointer (although the address-taking appears to be only to pass it to PT1_Filter(), which is innocent enough).

In fact, at this point, we have no idea which of the variables involved in the computation of the result of this call to PT1_Filter() is wrong. Clicking on a variable in the GUI provides the set of values for this variable at this program point, but this is still too imprecise here, since it mixes all 10000 or so passages through the statement.

Let us take advantage of the "blackboard" structure of the analyzed program and dump the entire program state at this statement, by inserting a call to Frama_C_dump_each(). See this previous post for a list of advantages of this built-in function over printf() or a debugger.

--- roco.c	(revision 12956)
+++ roco.c	(working copy)
@@ -293,6 +293,7 @@
                 desiredEngineVoltage, Engine_maxVoltage_PARAM);
         limitationActive = (Engine_maxVoltage_PARAM == desiredEngineVoltage) ||
                              (Engine_minVoltage_PARAM == desiredEngineVoltage);
+	Frama_C_dump_each();
         RoCo_engineVoltage = PT1_Filter (&voltageFilter, desiredEngineVoltage,
                 t13, dT);
         wasInit = init;

We need to launch the analysis again and find something to do for 10 minutes. This is a good time to start looking at bugs 2 and 3.

The log now contains state dumps for each passage through the statement where RoCo_engineVoltage is computed.

According to the ReadMe.txt in the challenge package, an order is given at t=50s. The log shows that this order fails to be executed speedily. The state dump at which lastTime contains 50000 and the next few ones show that of RoCo_engineVoltage's dependencies, variable desiredEngineVoltage is the one with the suspicious value: it is only -0.8, whereas parameters in file roco_config_testcase_1.h and values of the variable elsewhere in our log show that this voltage can go much higher. We are therefore left with the sub-problem of identifying why this variable has this value at this program point.

We use the same tools we have already used for RoCo_engineVoltage, this time applied to variable desiredEngineVoltage and this program point. The screenshot below shows the definitions sites for that value.

s4.png

The value of variable desiredEngineVoltage is defined by the call to function Limiter_Out(), whose argument is in turn defined by the call to Interpolate_from_curve() above.

Option -calldeps computed the implicit inputs of this call, which can be found in the log:

call Interpolate_from_curve at statement 423:
 \result FROM curve; x;
              EngineSpeedToVoltage_CURVE{.numPoints; .x[0..4]; .y[1..3]; }

The state dump in which lastTime==50000 shows that a low value for angleDiffRequest is the cause for the low value of desiredEngineVoltage.

The "Show defs" action in the GUI finds three possible definition sites for this value of angleDiffRequest, shown in the screenshots below.

s5.png

s6.png

s7.png

We find in the log that variable rampValue remains at 0 in the cycles that follow instant 50000. The value we observe for angleDiffRequest is compatible with the algorithm and values of variables at lines 264-277 of file roco.c. So it looks like the cause of the issue is the value of variable rampValue. Action "Show defs" in the GUI indicates that this value is computed by the call to Ramp_out() at line 240. The value of rampTarget is computed as 0.0 or 1.0 from a number of variables, and of these variables, RoCo_legAngleValid was always 1, and direction was always 0 or 1. The latter is suspicious, since in this execution, orders are given to move in both directions:

s8.png

The command "Show defs" applied to variable direction shows that indeed, the variable may have been set to 0 or 1 in three different sites.

s9.png

The site that corresponds to the MoveByAngle command, the middle one in the screenshot above, is suspicious: in the file main_testcase_1.c, the angle passed to this command is negative. This is not just a strange convention, because the computation below for the third definition site determined the direction for the other commands, that are all in the opposite direction, and variable direction was assigned 1 again there.

This suggests the fix below.

--- roco.c	(revision 13019)
+++ roco.c	(working copy)
@@ -211,7 +211,7 @@
                     direction = 0;
                 }
                 else {
-                    direction = (RoCo_desiredDeltaAngle > 0.0) ? -1 : 1;
+                    direction = (RoCo_desiredDeltaAngle > 0.0) ? 1 : -1;
                 }
                 RoCo_commandMoveByAngle = FALSE;
             }


Looking at the source code for related issues, one may notice that the value given to direction is also affected by the piece of code below. It would be worth the time testing different values of t9 and RoCo_hasMinMaxAngles_PARAM, although that complex computation is only active when using commands other than MoveByAngle.

            if ((fabs(t9) > t11) && (direction == 0)) {
                direction = ((((!RoCo_hasMinMaxAngles_PARAM) ||
                               (fabs(t9) >= 180.0))
                              ? t9 : -t9) > 0.0) ? 1 : -1;
            }

Thanks, and Next

My colleagues Benjamin Monate and Virgile Prevosto found the challenge and set up the files in the development repository, so that the only thing that was left to do was to identify the bugs, and started looking at them with me. Anne Pacalet and David Mentré have provided feedback for transforming this submission into a blog post. Anne Pacalet also implemented of all the code navigation features used in this study. The organizers Jochen Quante and Andrew Begel have done an impressive job of making this challenge interesting, and indeed, challenging.

I have also submitted my explanation of tasks 2 and 3 in the challenge. However, I thought someone might want to look at these themselves now that I have shown how easy it is on task 1. The only difficult part is finding an exp() function.

To be continued...

Saturday, May 21 2011

Exploding robots

I should have said this earlier — others might have wanted to join in the fun — but I will be sending an entry to participate in the International Conference on Program Comprehension's Industrial Challenge. My write-up will appear here shortly after the submission deadline.