I have a dev board (imx rt 1024) from nxp, which I write software for using MCUxpresso (nxp's IDE). For my project, I am asked to introduce position independent code (PIC), which, long story short, saves us seconds of downtime when installing a firmware update over the air.
I created a new standard project first, to get my hands dirty and to get a better understanding of how PIC works. So after making some changes in the linker script (putting the .got section somewhere in flash (LMA) and defining it to be in SRAM (VMA)), and making some modifications in startup code, I finally had my first step working: I can run code on my mcu compiled as "position independent code".
Obviously, I haven't moved it to "any" location just yet. But just as a first step, I was happy.
My next step was, is to pull in some code that responds to hardware interrupts. I don't know why, but I had the feeling that this might be tricky. Or, maybe I just wanted to see that working too. Unfortunately for me, it fails. Hard.
So I have a project, which toggles a pin on my dev boards, which toggles a led. Easy feedback for simple people like me. This works. Not very impressive yet.
Next, I compiled that same project with -fPIC, and got that working too.
Next, I pulled in freertos, create a simple task which does the "blinking" for me, and goes to sleep. Again, started without compiling with -fPIC. This works too. Again, not impressive, simple small steps.
The task that does the blinking for reference
static void main_task(void *params)
{
while (1)
{
GPIO1->DR ^= (1<<24);
vTaskDelay(100);
}
}
Now I tried, so many things, to get a handle on this, a lead, a vague idea of what is causing this, but I get nowhere. Moving the .got
section from to ITC sram, DTC sram, don't move it to SRAM at all, but leave it in flash (since I am not moving code in flash yet anyway). Nothing seems to change the behavior. As soon as I compile my freertos test project with -fPIC, it crashes when the task scheduler starts.
For reference, the linker script modifications:
.got : ALIGN(4)
{
__global_offset_table_flash_start__ = LOADADDR(.got) ;
__global_offset_table_itc_start__ = ADDR(.got) ;
*(.got* .got.*)
__global_offset_table_flash_end__ = . ;
} >SRAM_DTC AT>PROGRAM_FLASH
SIDE NOTE: this is copied to SRAM_DTC (DTC is optimized memory for data, ITC is optimized memory for instructions (/aka functions). I think .got
section contains pointers to data, and .got.plt
contains the functions for shared libraries (I don't use shared libraries, so I have no .got.plt
section). I tried the .got
section in both ITC and DTC sram. Both fail in the same way. To me it makes most sense to have the .got stored in DTC ram. So I stick with that until I learn that some of these assumption are wrong.
Also for reference, the adjusted part of the ResetHandler. In the startup code, I setup the r9 register which is used for PIC. I also copy over the .got section to SRAM_DTC
// Ignore the volatile stuff, it makes debugging easier, if I don't,
// I see 'optimized out' in mcuxpresso which is making my life harder
// other than that, it serves no functional purpose
// volatile is casted away with const_cast<...>
volatile extern unsigned int __global_offset_table_flash_start__;
volatile extern unsigned int __global_offset_table_itc_start__;
volatile extern unsigned int __global_offset_table_flash_end__;
volatile unsigned int size;
unsigned int index;
unsigned int *global_offset_table_flash;
unsigned int *global_offset_table_itc;
unsigned int *global_offset_table_end_itc;
unsigned int global_offset_table_size;
__attribute__ ((naked, section(".after_vectors.reset")))
void ResetISR(void)
{
// Disable interrupts
__asm volatile ("cpsid i");
// Setup r9 used for PIC, and let it point to the location in flash first
__asm volatile ("LDR r9, = __global_offset_table_flash_start__");
// Set the stack pointer, AFTER we setup r9
__asm volatile ("MSR MSP, %0" : : "r" (&_vStackTop) : );
//
// Copy global offset table to ram
//
global_offset_table_flash = const_cast<unsigned int*>(&__global_offset_table_flash_start__);
global_offset_table_itc = const_cast<unsigned int*>(&__global_offset_table_itc_start__);
global_offset_table_end_itc = const_cast<unsigned int*>(&__global_offset_table_flash_end__);
size =
reinterpret_cast<unsigned int>(&__global_offset_table_flash_end__) -
reinterpret_cast<unsigned int>(&__global_offset_table_itc_start__);
global_offset_table_size = static_cast<unsigned int>(&__global_offset_table_flash_end__ - &__global_offset_table_itc_start__);
for (index = 0u; index < size/sizeof(unsigned int); ++index)
{
global_offset_table_itc[index] = global_offset_table_flash[index];
}
__asm volatile ("LDR r9, = __global_offset_table_itc_start__");
// ... rest of startup code, initializes VTOR and some other nxp generated stuff
// ... before it jumps to main()
I understand real men would do this in assembly. I am not man enough for plain assembly yet. I can also imagine that hardcore assembly folks can be less familiar with this c++. I am pretty confident that this bit of code does what it claims to do. I debugged through this step by step. The .got section
is copied over uint32 by uint32 and ends up in SRAM.
After the jump to main is done. there is little code executed. I configure my LED pin, create a freertos task with plenty of stack, and start the scheduler.
static void main_task(void *params);
int main(void) {
/* Init board hardware. */
BOARD_ConfigMPU();
BOARD_InitBootPins();
BOARD_InitBootClocks();
BOARD_InitBootPeripherals();
#ifndef BOARD_INIT_DEBUG_CONSOLE_PERIPHERAL
/* Init FSL debug console. */
BOARD_InitDebugConsole();
#endif
gpio_pin_config_t USER_LED_config = {
.direction = kGPIO_DigitalOutput,
.outputLogic = 0U,
.interruptMode = kGPIO_NoIntmode
};
/* Initialize GPIO functionality on GPIO_AD_B1_08 (pin 82) */
GPIO_PinInit(GPIO1, 24U, &USER_LED_config);
xTaskCreate(
main_task,
"main",
2000,
nullptr,
2,
nullptr );
vTaskStartScheduler();
return 0 ;
}
When starting the scheduler, the code immediately crashes on a mem fault.
Is there anybody who has experience with position independent code and recognizes this behavior? Any tips? Some good pointers to rule things out?
I can share anything code related. It's just a standard new c++ project createdi in mcuxpresso. I can even share the entire project on github if that helps.
Help/pointers/tips are greatly appreciated!
With help from freertos community I solved the problem.
All I had to do in the end was modify the freertos port where the stack gets initialized to restore r9 on the stack which gets created.
StackType_t * pxPortInitialiseStack( StackType_t * pxTopOfStack,
TaskFunction_t pxCode,
void * pvParameters )
{
/* Simulate the stack frame as it would be created by a context switch
* interrupt. */
/* Offset added to account for the way the MCU uses the stack on entry/exit
* of interrupts, and to ensure alignment. */
pxTopOfStack--;
*pxTopOfStack = portINITIAL_XPSR; /* xPSR */
pxTopOfStack--;
*pxTopOfStack = ( ( StackType_t ) pxCode ) & portSTART_ADDRESS_MASK; /* PC */
pxTopOfStack--;
*pxTopOfStack = ( StackType_t ) portTASK_RETURN_ADDRESS; /* LR */
/* Save code space by skipping register initialisation. */
pxTopOfStack -= 5; /* R12, R3, R2 and R1. */
*pxTopOfStack = ( StackType_t ) pvParameters; /* R0 */
/* A save method is being used that requires each task to maintain its
* own exec return value. */
pxTopOfStack--;
*pxTopOfStack = portINITIAL_EXC_RETURN;
pxTopOfStack -= 8; /* R11, R10, R9, R8, R7, R6, R5 and R4. */
//
// I added this part, as suggested by freertos members
//
// Patched freertos for supporting -fpic: Set the task's initial R9 value
__asm ("MOV %[result], R9"
: [result] "=r" (pxTopOfStack[9-4])
);
return pxTopOfStack;
}