In his article about understanding the Linux Kernel Initcall Mechanism, Trevor created a userspace program that simulates the mechanism for calling the init_module() of linux drivers.
#include <stdio.h>
typedef int (*initcall_t)(void);
extern initcall_t __initcall_start, __initcall_end;
#define __initcall(fn) \
static initcall_t __initcall_##fn __init_call = fn
#define __init_call __attribute__ ((unused,__section__ ("function_ptrs")))
#define module_init(x) __initcall(x);
#define __init __attribute__ ((__section__ ("code_segment")))
static int __init
my_init1 (void)
{
printf ("my_init () #1\n");
return 0;
}
static int __init
my_init2 (void)
{
printf ("my_init () #2\n");
return 0;
}
module_init (my_init1);
module_init (my_init2);
void
do_initcalls (void)
{
initcall_t *call_p;
call_p = &__initcall_start;
do {
fprintf (stderr, "call_p: %p\n", call_p);
(*call_p)();
++call_p;
} while (call_p < &__initcall_end);
}
int
main (void)
{
fprintf (stderr, "in main()\n");
do_initcalls ();
return 0;
}
As you can see, the __initcall_start and __initcall_end are not defined so the linker will complain and will not produce an executable. The solution was to customize the default linker script(generated by ld --verbose) by adding the following lines before the text section:
__initcall_start = .;
function_ptrs : { *(function_ptrs) }
__initcall_end = .;
code_segment : { *(code_segment) }
Here is a snippet from the output of objdump -t :
0000000000000618 g function_ptrs 0000000000000000 __initcall_end<br>
0000000000000608 g .plt.got 0000000000000000 __initcall_start<br>
0000000000000608 l O function_ptrs 0000000000000008 __initcall_my_init1<br>
0000000000000610 O function_ptrs 0000000000000008 __initcall_my_init2<br>
0000000000000618 l F code_segment 0000000000000017 my_init1<br>
I understand the mechanism, I just don't see how the linker understood that __initcall_start should point to function_ptrs section or how the __initcall_end will point to the code_segment section either.
The way I see it, __initcall_start is assigned the value of the current output location, then a section function_ptrs is defined, which will point to the function_ptrs section from the input files, but I cannot see the link between the __initcall_start and the funtction_ptrs section.
My question is: How the linker is able to understand that __initcall_start should point to the funtion_ptrs ??
__initcall_start = .;
function_ptrs : { *(function_ptrs) }
__initcall_end = .;
code_segment : { *(code_segment) }
This bit of linker script instructs the linker how to compose a certain part of the output file. It means:-
__initcall_start
addressing the location-counter (i.e. .
)function_ptrs
composed of the concatenation of
all the input sections called function_ptrs
(i.e. the function_ptrs
segments from all the input files).__initcall_end
again addressing the location counter.code_segment
composed of the concatenation of
all the input sections called code_seqment
)The function_ptrs
section is the very first storage laid out at the location
addressed by __initcall_start
. So __initcall_start
is the address at which the linker
starts the function_ptrs
segment. __initcall_end
addresses the location
right after the function_ptrs
segment. And by the same token, it is the the address at
which the linker starts the code_segment
segment.
The way I see it, __initcall_start is assigned the value of the current output location,...
You are thinking that:
__initcall_start = .;
causes the linker to create a symbol that in some sense is a pointer and assigns the current location as the value of that pointer. A bit like this C code:
void * ptr = &ptr;
The same thinking is in evidence here (emphasis mine):
I just don't see how the the linker understood that __initcall_start should point to function_ptrs section or how the __initcall_end will point to the code_segment section either.
The linker has no concept of a pointer. It deals in symbols that symbolise addresses.
In the linker manual, Assignment: Defining Symbols you see:
You may create global symbols, and assign values (addresses) to global symbols, using any of the C assignment operators:
symbol = expression ;
...
This means simply that symbol
is defined as a symbol for the address computed by expression
.
Likewise:
__initcall_start = .;
means that __initcall_start
is defined as a symbol for the address at the current
location counter. It implies no type whatever for that symbol - not even that
it is a data symbol or a function symbol. The type of a symbol S
is a programming-
language concept that expresses how a program in that language may consume a byte-sequence whose
address is symbolised by S
.
A C program has a free hand to declare any type it likes for
an external symbol S
that it uses, as long as the linkage provides that symbol.
Whatever type that might be, the program will obtain the address that is symbolized by
S
with the expression &S
.
Your C program chooses to declare both __initcall_start
and __initcall_end
as of type:
int (*initcall_t)(void);
which makes good sense in the context of what the program tells the linker to do. It
tells the linker to layout the function_ptrs
section between the addresses
symbolized by __initcall_start
and __initcall_end
. This section comprises
an array of functions of type int ()(void)
. So type int (*initcall_t)(void)
is exactly right for traversing that array, as in:
call_p = &__initcall_start;
do {
fprintf (stderr, "call_p: %p\n", call_p);
(*call_p)();
++call_p;
} while (call_p < &__initcall_end)