How C Compilers Work Part 2 - Preprocessor

As said in the previous post, in modern compilers, preprocessing is not a separate phase but it's made together with compilation. Nevertheless, understanding the role of preprocessor is really helpful. The first thing to say is that it basically understands only rows that start with character hash (#).

In a standard program, those rows specifies header files to include and constants/macro to substitute in the rest of the file. Another frequent used feature is the conditional compilation (#if, #ifdef, etc.) to enable some part of code only if a condition is met at compile time. In this case the flag -D of GCC can be really useful.

A strange thing is the #pragma directive, used to issue commands directly to the compiler, in some cases to enable some vendor specific option. Other directives commonly used are #warning and #error; they force the compiler to present a warning or an error in special situations (usually depending on which other files are included or not included in the project).

An Example

Now let's see what a preprocessor does. Look at this simple program:
#include <stdio.h>
#include <string.h>

#define ARGS    1
#define TEST    "test_arg"

/* Main function */
int main (int argc, char **argv)
{
        if (argc != ARGS + 1) {
                fprintf(stderr, "Error! Expected %d param\n", ARGS);
                return 1;
        }

        if (strcmp(argv[1], TEST) != 0) {
                fprintf(stderr, "Error! Expected %s\n", TEST);
                return 2;
        }

        fputs("OK!\n", stdout);
        return 0;
}
Now if you compile it with:
gcc -Wall -E -o main_pp.c main.c
you'll get another C file named main_pp.c as a result (the flag -E tells GCC to only execute the preprocessor). If you don't have a compiler available, you can look at it here. Pretty big, isn't it?

What you should notice is that #include and #define directives have been processed and the comment has been removed. This obviously helps the programmer but basically almost all the work done by the preprocessor can be bypassed. In other words, the preprocessor is not indispensable. If you compile the following piece of code, you'll notice no differences in program execution compared to the original one.
typedef struct foo FILE;
FILE *stdout;
FILE *stderr;

int fprintf(FILE*, char*, ...);
int fputs(char*, FILE*);
int strcmp(const char*, const char *);

int main (int argc, char **argv)
{
        if (argc != 1 + 1) {
                fprintf(stderr, "Error! Expected %d param\n", 1);
                return 1;
        }

        if (strcmp(argv[1], "test_arg") != 0) {
                fprintf(stderr, "Error! Expected %s\n", "test_arg");
                return 2;
        }

        fputs("OK!\n", stdout);
        return 0;
}
How is this possible? How can it be that struct foo is a FILE? And what about other functions? For the answer, you'll have to wait the next two chapters of this series.

Troubleshooting

Usually preprocessor errors are easily understandable. For example:
failed.c:1:21: fatal error: missing.h
means that the header file missing.h does not exist or is not in the path. Another comprehensible error is the following:
failed.c:3:0: error: unterminated #ifdef
which remind us that an #endif is missing.

References

  • If you want to play with the above examples, source files are here.
  • A full explanation of the GCC preprocessor can be found at this page.
  • The idea for the second example has been taken from this blog post.

Post a Comment