Global Offset Table .got and .got.plt must be zero-initialized for STM32 microcontroller

I am in the process of compiling a program for an ARM Cortex-M4 microcontroller without the STM32 IDE. I use the arm-none-eabi toolchain with newlib libc, a linker script I adapted for my specific microcontroller and some startup code I took from ST.

After countless hours of debugging I found that a specific memory area must be zero-initialized. Otherwise, 0x200006E8...0x2000071c would contain random data which the CPU would access at some point, resulting in a Hard Fault. readelf -S shows that this corresponds to the .got and .got.plt sections:

  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .isr_vector       PROGBITS        08000000 001000 00018c 00   A  0   0  1
  [ 2] .text             PROGBITS        0800018c 00118c 00c4c4 00  AX  0   0  4
  [ 3] .rodata           PROGBITS        0800c650 00d650 0000a4 00   A  0   0  4
  [ 4] .ARM.extab        PROGBITS        0800c6f4 00d6f4 000000 00   A  0   0  1
  [ 5] .ARM              ARM_EXIDX       0800c6f4 00d6f4 000008 00  AL  2   0  4
  [ 6] .preinit_array    PREINIT_ARRAY   0800c6fc 00e71c 000000 04  WA  0   0  1
  [ 7] .init_array       INIT_ARRAY      0800c6fc 00d6fc 000004 04  WA  0   0  4
  [ 8] .fini_array       FINI_ARRAY      0800c700 00d700 000004 04  WA  0   0  4
  [ 9] .data             PROGBITS        20000000 00e000 0006e8 00  WA  0   0  8
  [10] .got              PROGBITS        200006e8 00e6e8 000028 00  WA  0   0  4
  [11] .got.plt          PROGBITS        20000710 00e710 00000c 04  WA  0   0  4
  [12] .sram2            PROGBITS        2000c000 00e71c 000000 00   W  0   0  1
  [13] .bss              NOBITS          2000071c 00e71c 000500 00  WA  0   0  4

The startup code only zero-initializes the .bss section, and does not mention got or plt. The linker script also has no reference to either got or plt. This means that there are no start and end labels for the GOT (like for .bss) which could be used by the startup code to initialize it, and I really don't want to hard-code the addresses.

I use the following flags to compile with GCC:

-mcpu=cortex-m4 -mfpu=fpv4-sp-d16 -mfloat-abi=hard -fno-common -ffunction-sections -fdata-sections -Wl,--gc-sections -specs=nano.specs -ffreestanding -Wall -O0 -ggdb

and to link:

-T link.ld -lc -lgcc

To my understanding, GOT is only needed for dynamic linking. Why is it inserted although the linker script does not specify it? I tried to remove the .got and .got.plt sections with objcopy, but this has no effect on the problem.

I am confused why the GOT should be initialized with zeroes in order to work, and not with some addresses. Is it possible that during the linking stages the GOT got inserted "into" the .bss somehow, i.e, the GOT section should really be part of .bss? The GOT is right before the .bss section (which is initialized as expected).

Ideally, I would like to just zero-initialize the .bss section and not make any modifications to the startup code. The GOT should either be not used at all or be statically populated.

Any ideas on what could be going on here are highly appreciated.

Solution

Maybe this is stuff you already know, I cannot tell from the question...

unsigned int x;

void fun ( void )
{
    x=5;
}

MEMORY {
    one : ORIGIN = 0x000, LENGTH = 256
    two : ORIGIN = 0x100, LENGTH = 256
}
SECTIONS {
    .text       : { *(.text)       } > one
    .bss        : { *(.bss)        } > two
}

arm-none-eabi-gcc -O2 -c -mcpu=cortex-m4 so.c -o so.o
arm-none-eabi-ld -Tso.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf


Disassembly of section .text:

00000000 <fun>:
   0:   4b01        ldr r3, [pc, #4]    ; (8 <fun+0x8>)
   2:   2205        movs    r2, #5
   4:   601a        str r2, [r3, #0]
   6:   4770        bx  lr
   8:   00000100    andeq   r0, r0, r0, lsl #2

Disassembly of section .bss:

00000100 <x>:
 100:   00000000    andeq   r0, r0, r0

IMO this is the normal/typical. (meaning, optimize and no pic/pie stuff, etc)(will get to gc-sections in a bit). And you can see that the compiler generates a value in the pool for the linker to fill in the address of x.

Note, I never use these as I do a lot of this bare-metal embedded stuff, but man gcc and look for the -fPIC -fpic and PIE to see the differences, it is an interesting read. In this case for this code the four combinations produce the same thing.

Disassembly of section .text:

00000000 <fun>:
   0:   4b03        ldr r3, [pc, #12]   ; (10 <fun+0x10>)
   2:   4a04        ldr r2, [pc, #16]   ; (14 <fun+0x14>)
   4:   447b        add r3, pc
   6:   589b        ldr r3, [r3, r2]
   8:   2205        movs    r2, #5
   a:   601a        str r2, [r3, #0]
   c:   4770        bx  lr
   e:   bf00        nop
  10:   00000008    andeq   r0, r0, r8
  14:   00000000    andeq   r0, r0, r0

It is a double indirect or let us say it adds a level of indirection. Now note the problem

   4:   447b        add r3, pc

the value in the pool is not the address of the got but the relative offset to the got. And the compiler is going to generate this for every function, which is the whole deal with using a got in the first place that you do not want to use fixed addresses in the pools of every function for addressing data.

Now position independence will also, mostly, in theory, you can try it, make the binary so that it is position independent and yes as with all data accesses this makes the code bigger. So only use pic/pie if you are absolutely needing it, esp for resource limited mcus. Every byte you save is a win. The clock cycle penalties are often measurable.

If you were using position independence you can do two things with it or both, move the code, move the data, or move both. If you move the code, then as with how it built for this target, you have to keep the got relative to the code. If pic is meant for shared libraries then this implies a ram based system, an operating system loading programs into ram. Where we are not on a ram based system unless you build for ram and copy and jump. As shown above (and the linker does not change/fix this) the code to got address relationship has to be fixed (for this compiler, version, target, example, etc)(if it happens in one example then it can happen to you).

But if you want to move the data then you have to update the got which means it has to be in ram. So got would need to be in ram but the binary is assumed to be in flash so if you want to run the binary from a different location (flash or ram) then you have to move the got a relative distance away. Then IF You want to move the data (binary moving or not) then you have to go to the got itself and add the relative change to the address to each entry. And yes in both cases you would need to know where the got is and how big.

If you do not move anything then the got is ready to go...from a build perspective. If you link it for ram though in an mcu then it will not be populated unless you do that in the bootstrap as you would for .data or .bss and that means like .data or .bss you have to add variables to the linker script (for this toolchain) or some other tricks you can do.

And it is basically some variation on this:

MEMORY {
    one : ORIGIN = 0x000, LENGTH = 256
    two : ORIGIN = 0x200, LENGTH = 256
}
SECTIONS {
    .text       : { *(.text)       } > one
    __LAB0__ = .;
    .bss        : { 
        __LAB1__ = .;
            *(.bss)        
    __LAB2__ = .;
    } > two AT > one
    __LAB3__ = .;
    .got        : 
    { 
    __LAB4__ = .;
        *(.got)       
    __LAB5__ = .;
    } > two AT > one
}



.align
.word __LAB1__
.word __LAB2__
.word __LAB3__
.word __LAB4__
.word __LAB5__

I put variables inside and outside because while one would think they should always be the same, the gnu linker does not. You also want some aligns in there and adjust your copy loops (in asm, do not bootstrap C in C) (never use memset or memcpy, just do it right, use asm to bootstrap C) so you can for example ldm/stm two or four registers at a time, so align on a 32 or 64 bit boundary and then copy some multiple of 32 or 64. (can use exact addresses if next address is not equal to end then break the loop)

Anyway the same way you already isolated the size and start of bss and how you handle .data you would just mimic the .data linker script and copy code for .got if you want it in ram.

If you ended up with .got because of some other, already built thing you are trying to link in, then put the .got in flash in your linker script. (got.plt as well). Then of course use readelf and objdump to confirm things are where they should be and the addresses are correct, etc. before you try to run it.

So your question has some confusing holes.

-mcpu=cortex-m4 -mfpu=fpv4-sp-d16 -mfloat-abi=hard -fno-common -ffunction-sections -fdata-sections -Wl,--gc-sections -specs=nano.specs -ffreestanding -Wall -O0 -ggdb

This looks borrowed. the sections ones are so that you can do -gc-sections in the linker, but then you specified a separate linker command line

-T link.ld -lc -lgcc

The -lc is very very scary, makes me shudder. But okay you can ask more so questions I guess...-lgcc won't work as is from the linker you need to do that from gcc or add a path, just how gnu works.

-O0 does not mate up with -gc-sections, either you want to make it smaller or you want to make it bigger, pick one. Or is the -O0 for some industry thing to avoid optimizations intentionally for safety reasons.

In order to actually have things removed you do the combination of the -sections options in the compile and -gc-sections in the link.

arm-none-eabi-gcc -O2 -c -ffunction-sections -fdata-sections -mcpu=cortex-m4 so.c -o so.o
arm-none-eabi-ld -Tso.ld -gc-sections -print-gc-sections  so.o -o so.elf
arm-none-eabi-objdump -D so.elf

so.elf: file format elf32-littlearm

You want the gc-print in there so you can see what is being removed, one might think hey this did a good job and then maybe after bricking or being confused as to what is happening find that there is a bug.

ENTRY(fun)
MEMORY {
    one : ORIGIN = 0x000, LENGTH = 256
    two : ORIGIN = 0x100, LENGTH = 256
}
SECTIONS {
    .text       : { *(.text)       } > one
    .bss        : { *(.bss)        } > two
}

The linker follows all the code paths and data paths and if it does not get hits then it removes things, so you have to be very careful with items that you want to keep but have otherwise not referenced by name.

In any case with that fixed.

Disassembly of section .text:

00000000 <fun>:
   0:   4b01        ldr r3, [pc, #4]    ; (8 <fun+0x8>)
   2:   2205        movs    r2, #5
   4:   601a        str r2, [r3, #0]
   6:   4770        bx  lr
   8:   00000100    andeq   r0, r0, r0, lsl #2

Disassembly of section .bss:

00000100 <x>:
 100:   00000000    andeq   r0, r0, r0

It removed our unused data and function saving a noticeable amount of space consumed by our program. And it is optimized saving more space and running faster.

You did not provide enough information to completely answer, but there were some hints as we covered in comments. YOUR code may not have been creating the .got but some code you linked may have. In that case make the linker put the .got items in flash and you are done with them. If you can build without pic/pie then do that instead. Re-evaluate all of your command line options -fno-common seems borrowed and scary for example. No need to bloat the code with debugger stuff, unless you need that, but then you still want to build for release and debug that as release and debug are (can be) different binaries and can have different results, esp bare metal.

From what we can derive from your question, if you need to initialize .got then you need to wrap it in the linker with variables just like .data and do a copy loop just like .data. Use .data as the reference, cut and paste and change the names. If you have to change .got runtime. If you do not then put .got in the flash and it is done, no init. If possible to re-build whatever is creating the .got then that is even better (if you have no need for position independence), smaller and faster and easier to deal with. Only specify position independence if you plan to use it and have added the runtime code (not generated by the tools, this is on you) to do the work to use it.

A proper got will not be all zeros, not for this platform. As you likely already understand it is filled with addresses, and needs to be in order to work. Unless, well the library is built so that you have to magically somehow figure out where everything is and then fill in the got and plt yourself which makes my head hurt, you are not doing dynamic libraries here right?