Cyclic dependency of global variables with extern specifier

Global variable can be declared without being defined by using extern storage class specifier. So I believe circular dependency can be introduced for global variables, just like how classes/modules can be made mutually dependent using forward declaration. How does a linker handles such dependencies among variable definitions? Does such practice produce an undefined behavior?

//source2.cpp

extern int b;
int a = b + 1;

//source1.cpp

#include<iostream>

extern int a;
int b = a + 1;

int main() {
    std::cout << a << " " << b <<std::endl;
}

or even,

#include<iostream>

extern int a;
int b = a + 1;
int a = b + 1;

int main() {
    std::cout << a << " " << b <<std::endl;
}

both prints out 2 1. What is happening? I guess linker solved external symbol int a to have value of 0. But how did it even decide external symbol-solving is finished, instead of being stuck forever in recursive search for variables' definitions?

Solution

This is what the standard has to say:

Variables with static storage duration are initialized as a consequence of program initiation. Variables with thread storage duration are initialized as a consequence of thread execution. Within each of these phases of initiation, initialization occurs as follows.

[...] Constant initialization is performed if a variable or temporary object with static or thread storage duration is initialized by a constant initializer for the entity. If constant initialization is not performed, a variable with static storage duration (6.7.1) or thread storage duration (6.7.2) is zero-initialized (11.6). Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. All static initialization strongly happens before (4.7.1) any dynamic initialization. [ Note: The dynamic initialization of non-local variables is described in 6.6.3; that of local static variables is described in 9.7. —end note ]

An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that

the dynamic version of the initialization does not change the value of any other object of static or thread storage duration prior to its initialization, and

the static version of the initialization produces the same value in the initialized variable as would be produced by the dynamic initialization if all variables not required to be initialized statically were initialized dynamically.

[ Note: As a consequence, if the initialization of an object obj1 refers to an object obj2 of namespace scope potentially requiring dynamic initialization and defined later in the same translation unit, it is unspecified whether the value of obj2 used will be the value of the fully initialized obj2 (because obj2 was statically initialized) or will be the value of obj2 merely zero-initialized. For example,
inline double fd() { return 1.0; }
extern double d1;
double d2 = d1;    // unspecified:
                   // may be statically initialized to 0.0 or
                   // dynamically initialized to 0.0 if d1 is
                   // dynamically initialized, or 1.0 otherwise
double d1 = fd();  // may be initialized statically or dynamically to 1.0
—end note ]

[...]

If [some conditions] V is defined before W within a single translation unit, the [dynamic] initialization of V is sequenced before the initialization of W.

Conceptually, static initialization is performed at translation time: the compiler emits a symbol whose value is the already-initialized value. In some cases this will be 0; in some cases, it will be the result of evaluating a constant expression initializer and/or calling a constexpr constructor for the variable. If any dynamic initialization needs to be done---because the actual initialization of the variable does not satisfy the conditions for constant initialization---then the compiler emits a piece of code that initializes the variables in that translation unit in definition order. The linker takes all these pieces of code that perform dynamic initialization and combines them in some order (possibly interleaved).

There is no infinite recursion, because the dynamic initialization of a does not kick off the dynamic initialization of b; it simply uses whatever value b already has, either because b was already dynamically initialized, or because it still has its value from static initialization. And vice versa. If b is dynamically initialized before a---and you have no guarantee of this since the two variables are defined in different translation units---then at the time of b's dynamic initialization, a has the value 0, so b becomes 1; then when a is dynamically initialized, its value becomes 2, so you see the result 2 1. But if a is dynamically initialized before b, you see 1 2.

In the case where there is only one translation unit, b's dynamic initialization must occur before a's because dynamic initializations within a single translation unit occur in definition order (not declaration). That explains the result 2 1 that you are seeing. However, this result of 2 1 is still not guaranteed because of the provision allowing dynamic initialization to be done statically. The compiler may choose to statically give a the value of 2 because that is the value that it would have if it were dynamically initialized. If the compiler made the choice to make a's initialization completely static but did not so choose for b, then the dynamic initialization of b would give it the value 3.

What about the case with two different translation units? Here the standard's wording is not clear but my interpretation is that it is allowed to fully statically initialize either or both a or b to any valid value that it could have based on any valid order of dynamic initialization! If only a is fully statically initialized, it could be statically initialized to either 1 or 2, causing b to become 2 or 3, respectively during dynamic initialization. Likewise if only b is fully statically initialized, it could be statically initialized to either 1 or 2, causing a to become 2 or 3, respectively. So:

For the first program, the possible results are 1 2, 2 1, 2 3, or 3 2.
For the second program, the possible results are 2 1 and 2 3.

I think that in practice, a compiler that gave either variable the value of 3 would make some users very angry and would probably stop doing this. Still, the theoretical possibility exists.

A way to avoid the issue of unpredictable initialization order is to forbid non-constant initializers for non-local static variables. In that case, there is no possibility of dynamic initialization occurring, so all initialization of non-local static variables happens in a well-defined order and results in a well-defined value, and in fact will most likely be evaluated at compile time.