When Unit Tests Fail

This week, my colleague +Giancarlo B. showed me this short function.
char *unescape(char *in)
{
        char *tmp;
        int i, x;
        char b[5];
 
        tmp = calloc(strlen(in), sizeof(char));
        x = 0;
        for(i = 0; i < strlen(in); i++) {
                if(in[i] == '+')
                        tmp[x++] = 32;
                else if(in[i] == '%') {
                        memset(b, 0, 5);
                        strncpy(b, &in[i + 1], 2);
                        tmp[x++] = (char) strtol(b, NULL, 16);
                        i += 2;
                } else {
                        tmp[x++] = in[i];
                }
        }
        tmp[x] = 0;
        return tmp;
}
It's purpose is to convert a string such as "this+is+a%20space" into "this is a space". The function works pretty well provided that the input string contains at least a "%xx" sequence. If not, the output allocated string is one character too short. To fix this, it's sufficient to substitute
        tmp = calloc(strlen(in), sizeof(char));
with
        tmp = calloc(strlen(in) + 1, sizeof(char));
The thing that makes this bug special is that it can only be found by looking at the code. Let's see why.

The Speed Of calloc

Image by Martin Maciaszek https://www.flickr.com/photos/fastjack/The first thing to understand is how calloc works. I've learn this thing a couple of years ago when searching for the differences in speed with malloc + memset. Basically, calloc returns a pointer to a memory area that belongs to an already blank page, so there is no need to clear it, saving time. At this link there is an extended explanation.

This means that is (almost) guaranteed the next byte after the memory returned by calloc is blank. Or, in other words, that the string is NULL-terminated. But unfortunately this is true only until another calloc is called.

This second call is likely to return a pointer to the first unallocated byte that can be later changed into something different from NULL, generating unexpected behaviors.

False Negative

Now you should have understood why unit tests can fail here. Suppose you have this code:
int do_test()
{
        int err = 0;       /* 0 = no errors */
        char *s1 = "this+is+a%20space";
        char *s2 = "thisisnotaspace";
        char *t1 = "this is a space";
        char *t2 = "thisisnotaspace";

        char *r1 = unescape(s1);
        if (strcmp(t1, r1) != 0)
                err = 1;   /* error on the fist test */

        char *r2 = unescape(s2);
        if (strcmp(t2, r2) != 0)
                err = 2;   /* error on the second test */

        free(r1);
        free(r2);
        return err;
}
I expect that this function always returns 0, that means no errors. But obviously this is wrong. And, even if I know a couple of way to modify the test code to catch this particular error, it's not said that enabling or disabling some compiler flags we obtain the same behavior.

Conclusions

I really believe that unit tests are useful in order to find a wide range of bugs, but, in some situation, developer's experienced eyes are indispensable.

Post a Comment