Search code examples
ccompilationlinker

Does including a header obviate the need for extern declarations?


On page 33 of K&R (The C Programming Language, 2e), they remark that

If the program is in several source files, and a variable is defined in file1 and used in file2 and file3, then extern declarations are needed in file2 and file3 to connect the occurrences of the variable. The usual practice is to collect extern declarations of variables and functions in a separate file, historically called a header, that is included by #include at the front of each source file. The suffix .h is conventional for header names. The functions of the standard library, for example, are declared in headers like <stdio.h>.

I am trying to understand how the first and second sentences above are connected. In particular, is the suggestion that the "usual practice" mentioned in the second sentence is a way to circumvent the requirement of using extern declarations noted in the first sentence?

Let me rephrase and expand. Does the first sentence essentially say that if I don't use a #include (as in the second sentence) then one needs to use extern declarations in order to let a linker (?) know that it needs to look for said variable in another object file? In contrast, perhaps this is not necessary if I do use a #include declaration of a header which defines this variable because then this is text substitution (by the preprocessor) and so the variable is declared in the same object file as that in which it's used?

I suspect in the end that my confusion arises from not fully understanding the compilation (and linking process).


Solution

  • Copying and expanding on a comment:

    You need to understand the difference between a source file, a header, and a translation unit (TU). The TU is the source file, plus any headers that it includes (directly or indirectly) — and that's what the compiler really compiles. If you include extern declarations in a header and include that header in the source file (directly or indirectly), then the extern declarations are visible in the TU just as surely as if you'd placed the declarations in the source file. The reason to declare them in a header is the reduced risk of inconsistency — of different declarations in different files.

    But this was followed up with:

    I am still not sure I understand then. Say I have a global [integer] variable var which I use in my main(). My understanding is that I would not need to use an extern int var; declaration in main() iff I either declare var in that same source file or if I use #include as you describe, but that I would need to use extern int var; if var were declared in another object file which was not #included in my present file (i.e. if they were only connected by the linker). That's more of what my question is about, if that makes sense?

    You need to read How do I use extern to share variables between source files?, which in turn references What is the difference between a definition and a declaration?

    I've quietly assumed that var has the type int — it doesn't much matter what the type is, but it needs a type.

    Somewhere in amongst the collection of object files that you link to create your program, there must be one (and only one) file scope definition of the variable var with external linkage (which is not the same as being preceded by extern).

    This might be written:

    int var;       // Roughly equivalent to int var = 0;
    

    or

    int var = 37;  // Or any other relevant value
    

    Formally, the first variation is a tentative definition; it ceases to be tentative if there isn't an alternative, non-tentative definition by the end of the TU.

    The header will contain:

    extern int var;    // No initializer
    

    Key Point

    All files that use var, and the file that defines var, will include the only header that declares the variable.


    This gives the vital cross-checking necessary to ensure that everything is consistent. It will require some care to organize the contents of headers so that small source files are not inundated with copious unused declarations (and type definitions, etc). But with practice, it becomes easy. You should also ensure that your headers are self-contained, idempotent and minimal.

    IMO, you should never write a declaration of any function with external linkage in any source file. If the function is to be used from multiple source files, there should be a header to declare the function, which should be included both where the function is defined and where it is used. If the function is not to be used from multiple source files, it should be defined (and perhaps declared) as static. If a static function is defined before it is used, there is no need to declare it separately, though some projects may prefer to declare all static functions near the top of the source file, even though it is not strictly necessary. If a static function is called before it is defined, you must declare it before using (in C99 or later — though compilers, especially older compilers, are hit'n'miss about enforcing that without help from the warning options).