I am experimenting by statically compiling a minimal program and examining the system calls that are issued:
$ cat hello.c
#include <stdio.h>
int main (void) {
write(1, "Hello world!", 12);
return 0;
}
$ gcc hello.c -static
$ objdump -f a.out
a.out: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004003c0
$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 39 vars */]) = 0
uname({sys="Linux", node="ubuntu", ...}) = 0
brk(0) = 0xa20000
brk(0xa211a0) = 0xa211a0
arch_prctl(ARCH_SET_FS, 0xa20880) = 0
brk(0xa421a0) = 0xa421a0
brk(0xa43000) = 0xa43000
write(1, "Hello world!", 12Hello world!) = 12
exit_group(0) = ?
I know that when linked non-statically, ld
emits startup code to map libc.so
and ld.so
into the process's address space, and ld.so
would continue loading any other shared libraries.
But in this case, why are so many system calls issued, apart from execve
, write
and exit_group
?
Why the heck uname(2)
? Why so many calls to brk(2)
to get and set the program break, and a call to arch_prctl(2)
to set the process state, when that seems like something that should have been done in kernel-space, at execve
time?
uname
is needed to check that the kernel version is not too ancient.
Two brk
s are needed to set up thread local storage. Two others are needed to set up dynamic loader path (the executable still might call dlopen
, even if it's statically linked). I'm not sure why these come in pairs.
On system arch_prctl
isn't called, set_thread_area
is called in its place. This sets up TLS for the current thread.
These things probably could be done lazily (i.e. called when corresponding facilities are used for the first time). But perhaps it would make no sense performance-wise (just a guess).
By the way gdb-7.x
can stop on system calls with the catch syscall
command.