ld: How this ld script works?

In his article about understanding the Linux Kernel Initcall Mechanism, Trevor created a userspace program that simulates the mechanism for calling the init_module() of linux drivers.

#include <stdio.h>

typedef int (*initcall_t)(void);
extern initcall_t __initcall_start, __initcall_end;

#define __initcall(fn) \
        static initcall_t __initcall_##fn __init_call = fn
#define __init_call     __attribute__ ((unused,__section__ ("function_ptrs")))
#define module_init(x)  __initcall(x);

#define __init __attribute__ ((__section__ ("code_segment")))

static int __init
my_init1 (void)
{
        printf ("my_init () #1\n");
        return 0;
}

static int __init
my_init2 (void)
{
        printf ("my_init () #2\n");
        return 0;
}

module_init (my_init1);
module_init (my_init2);

void
do_initcalls (void)
{
        initcall_t *call_p;

        call_p = &__initcall_start;
        do {
                fprintf (stderr, "call_p: %p\n", call_p);
                (*call_p)();
                ++call_p;
        } while (call_p < &__initcall_end);
}

int
main (void)
{
        fprintf (stderr, "in main()\n");
        do_initcalls ();
        return 0;
}

As you can see, the __initcall_start and __initcall_end are not defined so the linker will complain and will not produce an executable. The solution was to customize the default linker script(generated by ld --verbose) by adding the following lines before the text section:

__initcall_start = .;
function_ptrs : { *(function_ptrs) }
__initcall_end   = .;
code_segment    : { *(code_segment) }

Here is a snippet from the output of objdump -t :

0000000000000618 g function_ptrs        0000000000000000         __initcall_end<br>
0000000000000608 g .plt.got             0000000000000000         __initcall_start<br>
0000000000000608 l O function_ptrs      0000000000000008      __initcall_my_init1<br>
0000000000000610   O function_ptrs      0000000000000008      __initcall_my_init2<br>
0000000000000618 l F code_segment       0000000000000017          my_init1<br>

I understand the mechanism, I just don't see how the linker understood that __initcall_start should point to function_ptrs section or how the __initcall_end will point to the code_segment section either.

The way I see it, __initcall_start is assigned the value of the current output location, then a section function_ptrs is defined, which will point to the function_ptrs section from the input files, but I cannot see the link between the __initcall_start and the funtction_ptrs section.

My question is: How the linker is able to understand that __initcall_start should point to the funtion_ptrs ??

Solution

__initcall_start = .;
function_ptrs : { *(function_ptrs) }
__initcall_end   = .;
code_segment    : { *(code_segment) }

This bit of linker script instructs the linker how to compose a certain part of the output file. It means:-

Emit a symbol __initcall_start addressing the location-counter (i.e. .)
Then emit a section called function_ptrs composed of the concatenation of all the input sections called function_ptrs (i.e. the function_ptrs segments from all the input files).
Then emit a symbol __initcall_end again addressing the location counter.
Then emit a section called code_segment composed of the concatenation of all the input sections called code_seqment)

The function_ptrs section is the very first storage laid out at the location addressed by __initcall_start. So __initcall_start is the address at which the linker starts the function_ptrs segment. __initcall_end addresses the location right after the function_ptrs segment. And by the same token, it is the the address at which the linker starts the code_segment segment.

The way I see it, __initcall_start is assigned the value of the current output location,...

You are thinking that:

    __initcall_start = .;

causes the linker to create a symbol that in some sense is a pointer and assigns the current location as the value of that pointer. A bit like this C code:

void * ptr = &ptr;

The same thinking is in evidence here (emphasis mine):

I just don't see how the the linker understood that __initcall_start should point to function_ptrs section or how the __initcall_end will point to the code_segment section either.

The linker has no concept of a pointer. It deals in symbols that symbolise addresses.

In the linker manual, Assignment: Defining Symbols you see:

You may create global symbols, and assign values (addresses) to global symbols, using any of the C assignment operators:

symbol = expression ;

...

This means simply that symbol is defined as a symbol for the address computed by expression. Likewise:

__initcall_start = .;

means that __initcall_start is defined as a symbol for the address at the current location counter. It implies no type whatever for that symbol - not even that it is a data symbol or a function symbol. The type of a symbol S is a programming- language concept that expresses how a program in that language may consume a byte-sequence whose address is symbolised by S.

A C program has a free hand to declare any type it likes for an external symbol S that it uses, as long as the linkage provides that symbol. Whatever type that might be, the program will obtain the address that is symbolized by S with the expression &S.

Your C program chooses to declare both __initcall_start and __initcall_end as of type:

int (*initcall_t)(void);

which makes good sense in the context of what the program tells the linker to do. It tells the linker to layout the function_ptrs section between the addresses symbolized by __initcall_start and __initcall_end. This section comprises an array of functions of type int ()(void). So type int (*initcall_t)(void) is exactly right for traversing that array, as in:

call_p = &__initcall_start;
do {
        fprintf (stderr, "call_p: %p\n", call_p);
        (*call_p)();
        ++call_p;
} while (call_p < &__initcall_end)