Search code examples
ctranslation-unit

C: clarification on translation unit


If we have two .c files and a .h file: main.c sub.c sub.h, where

main.c

#include "sub.h"
...

sub.c

#include "sub.h"
...

we can compile the program with, either i)

gcc -o a.out main.c sub.c

or ii)

gcc -c main.c
gcc -c sub.c
gcc -o a.out main.o sub.o

Given this case, does preprocessor output one or two translation unit(s)?

I am confused because: main.c includes sub.h, meaning preprocessor would output one compilation unit. On the other hand, there are two object files created, main.o and sub.o, before creating executable, making me to think that "two source files thus two translation units."

Which part am I misunderstanding? or where am I making mistakes?


Solution

  • Here's what the C standard has to say about that:

    A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. [..] Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.

    (Source: C99 draft standard, 5.1.1.1 §1)

    So in both of your cases you have two translation units. One of them comes from the compiler preprocessing main.c and everything that is included through #include directives—that is, sub.h and probably <stdio.h> and other headers. The second comes from the compiler doing the same thing with sub.c.

    The difference from your first to your second example is that in the latter you are explicitly storing the "different translated translation units" as object files.

    Notice that there is no rule associating one object file with any number of translation units. The GNU linker is one example of linker that is capable of joining two .o files together.

    The standard, as far as I know, does not specify the extension of source files. Notwithstanding, in practical aspects you are free to #include a .c file into other, or placing your entire program in a .h file. With gcc you can use the option -x c to force a .h file to be treated as the starting point of a translation unit.

    The distinction made here:

    A source file together with all the headers and source files included via the preprocessing directive #include [...]

    is because a header need not be a source file. Similarly, the contents of <...> in an #include directive need not be a valid file name. How exactly the compiler uses the named headers <...> and "..." is implementation-defined.