I'm trying to understand the linking and loading phases in depth.
When a translation unit is compiled / assembled into a single object file, i understand that it creates a symbol table of every variable / function found.
If a variable has only file scope by using the static keyword for example, it will be marked as local in the symbol table.
However, when the linker produces the final executable file, is there a final symbol table there with every single entry encountered for all files?
I was confused because if we have a variable declared as static meaning only file scope within one file, when this variable is encountered every time in the executable, does the compiler have to reference the final symbol table to see its actual scope, or does it generate special code for it?
Thanks ahead.
When a translation unit is compiled / assembled into a single object file, i understand that it creates a symbol table of every variable / function found.
That is mostly accurate: local (aka stack, aka automatic storage duration) variables are never put into the symbol table (except when using ancient debugging formats, such as STABS).
You don't need to take my word for it: this is trivial to observe:
$ cat foo.c
int a_common_global;
int a_global = 42;
static int a_static = 43;
static int static_fn()
{
return 44;
}
int global_fn()
{
int a_local = static_fn();
static int a_function_static = 1;
return a_local + a_static + a_function_static;
}
$ gcc -c foo.c
$ readelf -Ws foo.o
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000004 4 OBJECT LOCAL DEFAULT 3 a_static
6: 0000000000000000 11 FUNC LOCAL DEFAULT 1 static_fn
7: 0000000000000008 4 OBJECT LOCAL DEFAULT 3 a_function_static.1800
8: 0000000000000000 0 SECTION LOCAL DEFAULT 6
9: 0000000000000000 0 SECTION LOCAL DEFAULT 7
10: 0000000000000000 0 SECTION LOCAL DEFAULT 5
11: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM a_common_global
12: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 a_global
13: 000000000000000b 34 FUNC GLOBAL DEFAULT 1 global_fn
There are a few things worth noting here:
a_local
does not appear in the symbol tablea_function_static
got "random" number appended to its name. This is so a_function_static
in a different function will not collide.a_static
and static_fn
have LOCAL
linkageNote also that while a_static
and static_fn
appear in the symbol table, this is done only to assist debugging. The local symbols are not used by subsequent link, and can be safely removed.
After running strip --strip-unneeded foo.o
:
$ readelf -Ws foo.o
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3
3: 0000000000000000 0 SECTION LOCAL DEFAULT 4
4: 0000000000000000 0 SECTION LOCAL DEFAULT 5
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM a_common_global
8: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 a_global
9: 000000000000000b 34 FUNC GLOBAL DEFAULT 1 global_fn
when the linker produces the final executable file, is there a final symbol table there with every single entry encountered for all files?
Yes. Adding main.c
like so:
$ cat main.c
extern int global_fn();
extern int a_global;
int a_common_global = 23;
int main()
{
return global_fn() + a_common_global + a_global;
}
$ gcc -c main.c foo.c
$ gcc main.o foo.o
$ readelf -Ws a.out
Symbol table '.symtab' contains 69 entries:
Num: Value Size Type Bind Vis Ndx Name
... I omit un-interesting entries (there are many).
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
34: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
35: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
36: 0000000000201030 4 OBJECT LOCAL DEFAULT 23 a_static
37: 000000000000061c 11 FUNC LOCAL DEFAULT 13 static_fn
38: 0000000000201034 4 OBJECT LOCAL DEFAULT 23 a_function_static.1800
50: 0000000000000627 34 FUNC GLOBAL DEFAULT 13 global_fn
63: 00000000000005fa 34 FUNC GLOBAL DEFAULT 13 main
64: 000000000020102c 4 OBJECT GLOBAL DEFAULT 23 a_global
I was confused because if we have a variable declared as static meaning only file scope within one file, when this variable is encountered every time in the executable, does the compiler have to reference the final symbol table to see its actual scope, or does it generate special code for it?
At link stage, the compiler is (usually) not invoked at all. And the linker doesn't (doesn't need to) pay any attention to LOCAL
symbols.
In general, the linker only does two things:
global_fn
and a_global
from main.o
) to their definitions (here in foo.o
) andApplying relocations for a_static
and a_function_static
in foo.o
doesn't actually need their names; only their offsets within the .data
section, as this output should make clear:
$ objdump -dr foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
...
000000000000000b <global_fn>:
b: 55 push %rbp
c: 48 89 e5 mov %rsp,%rbp
f: 48 83 ec 10 sub $0x10,%rsp
13: b8 00 00 00 00 mov $0x0,%eax
18: e8 e3 ff ff ff callq 0 <static_fn>
1d: 89 45 fc mov %eax,-0x4(%rbp)
20: 8b 15 00 00 00 00 mov 0x0(%rip),%edx # 26 <global_fn+0x1b>
22: R_X86_64_PC32 .data
26: 8b 45 fc mov -0x4(%rbp),%eax
29: 01 c2 add %eax,%edx
2b: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 31 <global_fn+0x26>
2d: R_X86_64_PC32 .data+0x4
31: 01 d0 add %edx,%eax
33: c9 leaveq
34: c3 retq
Note how relocations at offset 0x22
and 0x2d
don't say anything about the names (a_static
and a_function_static.1800
respectively).