Frama-C news and ideas

To content | To menu | To search | Frama-C Home

str2long

A coding contest of weeks past

A fortnight ago, I sent an entry in this Quick Coding Contest. The objective of the contest is to produce a C function to convert a string to the integer it represents. Now, it turns out that my submission generates the shortest code after libc, whatever that means. I am not sure why that is. Other submissions duplicate more code than mine, but I read yet more submissions that felt just as short as mine.

Rather than the code duplication, I could tease Robert Graham for using type int for indexes into a char array, which can lead to incorrect behavior for strings of more than 2GiB on 64-bit and even on 32-bit architectures:

long str2long_robert_2(const char *str)
{
    long result = 0;
    int i;
…
       for (i=1; str[i]; i++) {
            int c = str[i];

The proper type to use in C is size_t, I think. In fact, ptrdiff_t would feel just as natural to me, although that would take us back to the issue with 2.5GiB strings and 32-bit architectures.

There are good reasons to forbid objects of more than 2GiB on 32-bit architectures (making the use of int as index in a char array safe in that case). But on my system, malloc() still lets me allocate 2.5GiB without a peep. Actually, I didn't change systems since the last time ptrdiff_t was discussed in this blog.

I do not have the heart to tease, because writing with high-level idioms portable across languages is a laudable goal. Still, if I was going to use a higher-level language with no unsigned type, I would compute using the multiprecision integer type that any high-level language should have. C does not have native multiprecision integers and it has unsigned types. I feel that using the latter is acceptable when it compensates for the absence of the former.

Making an even shorter str2long() function

My submission looks like this:

long str2long_pascal (const char *p)
{
  unsigned long l = 0;
  int neg = 0;
  if (!*p) goto err;
  if (*p == '-')
    {
      p++;
      neg = 1;
    }
 loop:
  if (*p < '0' || *p > '9') goto err;
  …

One nice trick I saw in other submissions was to use neg = (*p == '-'); instead of what I wrote. I do not know whether compilers, when provided with my version, are typically able to simplify it into this one. It would be a sophisticated optimization, but sounds plausible.


I really wanted to remark that statement if (!*p) goto err; in my submission was completely useless. It is intended to ensure that the empty string input is mapped onto the error case. If the statement was removed, in the case of the empty input, the test for '-' would fail, and next control would go to if (*p < '0' || *p > '9') goto err;, which would send execution to label err just the same.

I am pretty sure that no compiler knows how to detect this second trick by itself.

What can I say in guise of excuse? I spend a lot of time writing in OCaml, where recursion and pattern-matching are more idiomatic. Yes, I think this is a good excuse, and I will explain why at length in a subsequent blog post.

Also, I noticed the uselessness of the empty string test while the contest was still open, but I wanted to keep my impressive score on metrics other than code size and speed, as bragged about in the next section.

Other metrics the contest should definitely be judged on

I was a little disappointed to see that code size and speed were used for the first evaluation of the submissions, because I optimized for completely different criteria.

  • Least number of structured loops: I hoped there would be a special prize for this easily measurable and totally objective criterion.
  • Time spent: I submitted my function after spending 27 minutes on it in total. This may seem like a lot for 20 lines of C. But it is actually fair for 20 lines of C code that do something and behave as intended. I achieved the 27 minutes time by :
  • Least quantity of compilation, static analysis and general parsing, not to mention testing, of the submitted function : I did not compile or analyze my entry before sending it in. I was relieved to find out that it compiled at all after I sent the e-mail. I spent half of the working time proofreading my submission, trying to think of all the ways I had written incorrect C programs in the past.

Conclusion

On a more serious note, I am still unsure whether the confidence gained per unit of time was better or worse with reviewing than it would have been with testing. In this instance, for a throwaway submission the tests of which I would never have a chance to reuse, I was happy to forgo testing completely and simply read my function again and again.

If not having to write another unit test is like reaching the week-end, not having to write any unit tests is like being on holidays.

Becoming serious again, I think that similarly to the situation with static analysis and tests, a comparison between reviewing code and tests does not make complete sense because different techniques are good at finding different issues. Rational developers use all available tools (and reviewing code, as one such tool, is probably underrated).

Comments

1. On Wednesday, March 20 2013, 21:29 by John Regehr

Hi Pascal, I agree that code simplicity and readability should be the main metric, but I was worried I wouldn't have time to fully understand 35 entries. I plan to look closely at maybe a dozen entries that are among the fastest and smallest.

2. On Wednesday, March 20 2013, 22:40 by pascal

Hi John!

You commented “I agree that code simplicity and readability should be the main metric” but I didn't claim that my submission was simple or readable. I hope I am above this sort of empty boast.