Search code examples
ccompilationfortranstatic-analysisc99

Tentative definitions in C and linking


Consider the C program composed of two files,

f1.c:

int x;

f2.c:

int x=2;

My reading of paragraph 6.9.2 of the C99 standard is that this program should be rejected. In my interpretation of 6.9.2, variable x is tentatively defined in f1.c, but this tentative definition becomes an actual definition at the end of the translation unit, and (in my opinion), should therefore behave as if f1.c contained the definition int x=0;.

With all compilers (and, importantly, linkers) I was able to try, this is not what happens. All compilation platforms I tried do link the above two files, and the value of x is 2 in both files.

I doubt this happens by accident, or just as an "easy" feature to provide in addition to what the standard requires. If you think about it, it means there is special support in the linker for those global variables that do not have an initializer, as opposed to those explicitly initialized to zero. Someone told me that the linker feature may be necessary to compile Fortran anyway. That would be a reasonable explanation.

Any thoughts about this? Other interpretations of the standard? Names of platforms on which files f1.c and f2.c refuse to be linked together?

Note: this is important because the question occurs in the context of static analysis. If the two files may refuse to be linked on some platform, the analyzer should complain, but if every compilation platform accepts it then there is no reason to warn about it.


Solution

  • See also What are extern variables in C. This is mentioned in the C standard in informative Annex J as a common extension:

    J.5.11 Multiple external definitions

    There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).

    Warning

    As @litb points out here, and as stated in my answer to the cross-referenced question, using multiple definitions for a global variable leads to undefined behaviour, which is the standard's way of saying "anything could happen". One of the things that can happen is that the program behaves as you expect; and J.5.11 says, approximately, "you might be lucky more often than you deserve". But a program that relies on multiple definitions of an extern variable - with or without the explicit extern keyword - is not a strictly conforming program and not guaranteed to work everywhere. Equivalently: it contains a bug which may or may not show itself.

    See also How do I use extern to share variables between source files?

    As noted by Sven in a comment, and in my answer to "How do I use extern…", GCC changed its default rules relatively recently. In GCC 10.x (from May 2020) and later versions, the default compilation mode uses -fno-common whereas in prior versions the default mode used -fcommon. The new behaviour means that you do not get away with multiple tentative definitions, which is what the C standard requires for strict conformance.

    If you use GCC and have code that (ab)uses multiple tentative definitions, you can add -fcommon to the compilation process and it will work as before. However, your code is not maximally portable, and it would be better for the long-term to revise the code so that each variable is properly defined in one source file (that is linked with all programs that need to use that variable) and properly declared in one header that the source files which use the variable can all include (and the source file defining the variable should also include the header to ensure consistency).