In C and C++ we can manipulate a variable's linkage. There are three kinds of linkage: no linkage, internal linkage, and external linkage. My question is probably related to why these are called "linkage" (How is that related to the linker).
I understand a linker is able to handle variables with external linkage, because references to this variable is not confined within a single translation unit, therefore not confined within a single object file. How that actually works under the hood is typically discussed in courses on operating systems.
But how does the linker handle variables (1) with no linkage and (2) with internal linkage? What are the differences in these two cases?
The linker isn't normally involved in either internal linkage or no linkage--they're resolved entirely by the compiler, before the linker gets into the act at all.
Internal linkage means two declarations at different scopes in the same translation unit can refer to the same thing.
No linkage means two declarations at different scopes in the same translation unit can't refer to the same thing.
So, if I have something like:
int f() {
static int x; // no linkage
}
...no other declaration of x
in any other scope can refer to this x
. The linker is involved only to the degree that it typically has to produce a field in the executable telling it the size of static space needed by the executable, and that will include space for this variable. Since it can never be referred to by any other declaration, there's no need for the linker to get involved beyond that though (in particular, the linker has nothing to do with resolving the name).
Internal linkage means declarations at different scopes in the same translation unit can refer to the same object. For example:
static int x; // a namespace scope, so `x` has internal linkage
int f() {
extern int x; // declaration in one scope
}
int g() {
extern int x; // declaration in another scope
}
Assuming we put these all in one file (i.e., they end up as a single translation unit), the declarations in both f()
and g()
refer to the same thing--the x
that's defined as static
at namespace scope.
For example, consider code like this:
#include <iostream>
static int x; // a namespace scope, so `x` has internal linkage
int f()
{
extern int x;
++x;
}
int g()
{
extern int x;
std::cout << x << '\n';
}
int main() {
g();
f();
g();
}
This will print:
0
1
...because the x
being incremented in f()
is the same x
that's being printed in g()
.
The linker's involvement here can be (and usually is) pretty much the same as in the no linkage case--the variable x
needs some space, and the linker specifies that space when it creates the executable. It does not, however, need to get involved in determining that when f()
and g()
both declare x
, they're referring to the same x
--the compiler can determine that.
We can see this in the generated code. For example, if we compile the code above with gcc, the relevant bits for f()
and g()
are these.
f:
movl _ZL1x(%rip), %eax
addl $1, %eax
movl %eax, _ZL1x(%rip)
That's the increment of x
(it uses the name _ZL1x
for it).
g:
movl _ZL1x(%rip), %eax
[...]
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c@PLT
So that's basically loading up x
, then sending it to std::cout
(I've left out code for other parameters we don't care about here).
The important part is that the code refers to _ZL1x
--the same name as f
used, so both of them refer to the same object.
The linker isn't really involved, because all it sees is that this file has requested space for one statically allocated variable. It makes space for that, but doesn't have to do anything to make f
and g
refer to the same thing--that's already handled by the compiler.