Creating cortex-m7 project from scratch where to start

I want to create my own startup, linker-script and init-files, configure makefile and gcc-toolchain. Where can i find resources, tutorials etc about it? Maybe some minimal example implementations?

Solution

close to minimal bootstrap, can certainly be smaller.

flash.s

.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hangout
.word hangout
.word hangout

.thumb_func
reset:
    bl notmain
    b hangout

.thumb_func
hangout:   b .

.align

.thumb_func
.globl PUT32
PUT32:
    str r1,[r0]
    bx lr

.thumb_func
.globl GET32
GET32:
    ldr r0,[r0]
    bx lr

.thumb_func
.globl dummy
dummy:
    bx lr

read the arm documentation, the exception and reset table is not as well done as it could be but still shows that the stack pointer init value is first, reset vector second and so on for the internal core exceptions then goes on into the interrupts where that is in part core defined and chip vendor defined as to how many there are, 16, 32, 64, 128, fewer or more...

example program to demonstrate a C entry point and calls into the asm.

notmain.c

void PUT32 ( unsigned int, unsigned int );
void notmain ( void )
{
    unsigned int ra;
    for(ra=0;;ra++) PUT32(0x20000100,ra);
}

not quite minimal linker script but close

flash.ld

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > rom
    .rodata : { *(.rodata*) } > rom
    .bss : { *(.bss*) } > ram
}

Technically the vector table resets to 0x00000000 (VTOR) but some chip vendors map the application flash at another address as well as zero when booting that flash so the STM32 family tree is generally 0x08000000, some others 0x01000000 I think maybe it was 0x10000000, whatever, but they need to map to zero for reset (if this code is really called in reset and there isnt a bootloader faking reset). so you can leave the 0x00000000 for rom or try to change it.

Minimal example so set the stack pointer and memory sizes small. For a cortex-m7 these numbers should work for a cortex-m0 and maybe some others these might actually be too big and fail.

All the cortex-ms all the cores up to but not the 64 bit instruction set support the original thumb instructions from armv4t and you dont need to venture past that for a minimal starting point, not bad to have skeleton code in your back pocket and choose the core later. Basically dont borrow from your cortex-m7 code and build for a cortex-m0 which doesnt support the same set of thumb2 extensions, it may not work.

build (for cortex-m0 for now armv6-m, original thumb plus a couple dozen thumb2s supported)

arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -o notmain.elf -T flash.ld flash.o notmain.o
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf notmain.bin -O binary

not necessarily all of the command line options are required, depends on your project, version of gnu, etc. this code is written so that arm-whatever- works arm-linux-gnueabi, etc...

For it to boot properly the vector table needs to be up front and formed properly. Good thing to check before programming into the flash of a new part, dont want to brick the thing just after you got it...

Disassembly of section .text:

00000000 <_start>:
   0:   20001000    andcs   r1, r0, r0
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   0000001b    andeq   r0, r0, r11, lsl r0
   c:   0000001b    andeq   r0, r0, r11, lsl r0
  10:   0000001b    andeq   r0, r0, r11, lsl r0

00000014 <reset>:
  14:   f000 f808   bl  28 <notmain>
  18:   e7ff        b.n 1a <hangout>

0000001a <hangout>:
  1a:   e7fe        b.n 1a <hangout>

the disassembly on the table is bogus of course, I used the disassembler to see these items not some other dump tool. adress zero first word is the stack pointer init value, some bootloaders/chips require that to be something sane, some dont, you dont HAVE TO use that as your stack pointer init, you can always do it the old fashioned way and init in the reset handler. Was just reading up on a new to me part (have tried most of the vendors at this point) and they did say that value had to be within some range before it would boot.

The rest of the vectors, reset and other need to be the address ORRED with 1, so reset is 0x14, 0x14|1 = 0x15, check...same goes for the other few vectors I put in there, you would normally want to cover at least the exceptions, and then if you enable any resets then fill up the table with those as well. Nothing magical about this memory space its just flash, you can use the vector table space with code or data if you are not using the vector table space, but if you then do get that interrupt or exception and you dont have a sane handler, no joy.

I like to abstract my accesses for many reasons, a lot of folks dont. You choose how you want to do it...

As you can see this will keep writing to sram at 0x20000100 (assuming your sram starts at 0x20000000 and not 0x40000000, 0x20000000 is a very popular choice among the vendors using cortex-m cores).

arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m7 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m7 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -o notmain.elf -T flash.ld flash.o notmain.o
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf notmain.bin -O binary

change it to a cortex-m7....and well I didnt have anything in this project that a thumb2 instruction could do better.

A nice thing about the cortex-m architecture design

flash.s just the table

.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word notmain
.word hangout
.word hangout
.word hangout

notmain.c

#define SOME_RAM (*((volatile unsigned int *)0x20000100))
void notmain ( void )
{
    unsigned int ra;
    for(ra=0;;ra++) SOME_RAM=ra;
}
void hangout ( void )
{
    while(1) continue;
}

build

Disassembly of section .text:

00000000 <_start>:
   0:   20001000    andcs   r1, r0, r0
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   00000025    andeq   r0, r0, r5, lsr #32
   c:   00000025    andeq   r0, r0, r5, lsr #32
  10:   00000025    andeq   r0, r0, r5, lsr #32

00000014 <notmain>:
  14:   2300        movs    r3, #0
  16:   4a02        ldr r2, [pc, #8]    ; (20 <notmain+0xc>)
  18:   6013        str r3, [r2, #0]
  1a:   3301        adds    r3, #1
  1c:   e7fc        b.n 18 <notmain+0x4>
  1e:   bf00        nop
  20:   20000100    andcs   r0, r0, r0, lsl #2

00000024 <hangout>:
  24:   e7fe        b.n 24 <hangout>
  26:   bf00        nop

the logic itself confirms to ARMs calling convention, so if the compiler does as well and you dont want to wrap the reset handler you dont need to.

I never need to zero .bss nor init .data in my projects, but many folks do and that makes the linker script more complicated, doesnt need to be as crazy as most folks make it. And a little more assembly to do the zero of .bss and the copy of .data.

A working led blinker for a particular cortex-m7 microcontroller.

void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );

#define RCCBASE 0x40023800
#define RCC_AHB1ENR   (RCCBASE+0x30)
#define RCC_AHB1LPENR (RCCBASE+0x50)

#define GPIOABASE 0x40020000
#define GPIOA_MODER     (GPIOABASE+0x00)
#define GPIOA_OTYPER    (GPIOABASE+0x04)
#define GPIOA_BSRR      (GPIOABASE+0x18)

#define GPIOBBASE 0x40020400
#define GPIOB_MODER     (GPIOBBASE+0x00)
#define GPIOB_OTYPER    (GPIOBBASE+0x04)
#define GPIOB_BSRR      (GPIOBBASE+0x18)

//PA5 or PB0 defaults to PB0
//PB7
//PB14

int notmain ( void )
{
    unsigned int ra;
    unsigned int rx;

    ra=GET32(RCC_AHB1ENR);
    ra|=1<<1; //enable GPIOB
    PUT32(RCC_AHB1ENR,ra);

    ra=GET32(GPIOB_MODER);
    ra&=~(3<<(0<<1)); //PB0
    ra|= (1<<(0<<1)); //PB0
    ra&=~(3<<(7<<1)); //PB7
    ra|= (1<<(7<<1)); //PB7
    ra&=~(3<<(14<<1)); //PB14
    ra|= (1<<(14<<1)); //PB14
    PUT32(GPIOB_MODER,ra);
    //OTYPER
    ra=GET32(GPIOB_OTYPER);
    ra&=~(1<<0); //PB0
    ra&=~(1<<7); //PB7
    ra&=~(1<<14); //PB14
    PUT32(GPIOB_OTYPER,ra);

    for(rx=0;;rx++)
    {
        PUT32(GPIOB_BSRR,((1<<0)<<0)|((1<<7)<<16)|((1<<14)<<0));
        for(ra=0;ra<200000;ra++) dummy(ra);
        PUT32(GPIOB_BSRR,((1<<0)<<16)|((1<<7)<<0)|((1<<14)<<16));
        for(ra=0;ra<200000;ra++) dummy(ra);
    }
    return(0);
}

you dont HAVE TO use the HAL or CMSIS or other third party resources. Professionally you should know how or periodically try, but one of the best things about bare-metal programming is you are only truly limited by the hardware and its rules, you can do whatever you want to generate code that functions so long as it conforms to the rules of the logic of the chips and board.

gcc is thankfully just a compiler turns C into assembly, as turns assembly into objects and ld links these things based on command line or linker script direction. When you start doing things that require gcclib (division, multiplication, floating point) or you start using the C library, now the compiler matters (arm-none-eabi vs arm-whatever-linux-whatever) and the library C or other matters. gcc for some reason is compiled to be able to find the gcclib based on the path to gcc, but ld cannot, so as ugly as it is, if you find yourself in that position you may choose to use gcc to call the linker. I let gcc call the assembler because I have no real reason not to. But calling the linker, you have to defeat the default bootstrap and linker script if there is one. Calling the linker directly you can control all of that and not need all of those gcc command line options.

At least once in my career have dealt with tools that if they see main() they add more junk, so use an entry point that is not called main(). You are free to name your C entry point whatever you like. Or have multiple if you want the bootstrap to call more than one function...

So in short, this core/family uses a vector table, other processors do not. You have to master the tools enough to get the processor to boot that means a correct vector table in the right place. Need to know the minimum requirements for your compiler, usually set the stack pointer and call the entry point or branch to if you never return. The linker usually requires some handholding from -Ttext=0x0 -Tdata=0x20000000 to a linker script for ld. Dont expect linker scripting languages to be remotely the same from one toolchain to another (gnu, kiel, arm, etc), my recommendation is if you ever plan to port, use as little toolchain specific stuff as you can. Then CHECK THE BINARY before you try to use it, some chips once you get a hang on the core from a flash based boot you cant get out of it, the stm32s you can some others you can some you cant.

Getting the binary in whatever form onto the flash on the part, thats another discussion. Start with the chip vendors documentation.