arm-none-eabi-gcc with nucleo L432KC board

I am looking for a procedure to compile and upload my code for the STM32L432KC nucleo board from the linux terminal like the procedure I used with my atmega328p Here

I kinda got attached to using vim and the gdb debugger and I was so happy doing so for my avr atmega328p with avr-gcc and avra for assembly for a while now But now I wanted to move on and dive deeper into embedded systems so I bought my nucleo board Documentation Page

So I just need a small tutorial like the one above for compiling, linking and flashing the code without the need to install any IDEs

Solution

The STM32 chips are all cortex-m based (a core they purchase from ARM). And so far they all support the cortex-m0 instruction set (armv6-m). You can follow the ST documentation to see what cortex-m core it has to the technical reference manual at arms website infocenter.arm.com and in there it says which architecture (armv6-m armv7-m armv8-m...) and in there you find out about the instruction set and the architecture. You should not start this journey without the minimum documents. The ARM TRM and ARM ARM for the core and architecture. and the REFERENCE manual from ST (not the programmers manual from either them) and the datasheet from ST.

The cortex-ms boot off of a vector table, described in the architectural reference manual (ARM). The first word is loaded into the stack pointer the second is the reset vector and it is defined as requiring the lsbit to be a 1 (indicating this is a thumb function address). And you can read about the rest. To make a minimal example that is good enough.

All of the STM32 chips I have worked with (I have worked with a ton of them) support a user flash based at 0x08000000 and SRAM at 0x20000000, some of the newer firmware that comes with nucleo boards will insist on the proper 0x08000000 address in the vector table (some small percentage also support a faster memory address at 0x00200000). The ARM documentation will say 0x00000000 basically or indicate a VTOR thing but in reality it is generally 0x00000000 as the address that the logic looks for to find the vector table on reset. Various ways to skin this cat but ST chooses to mirror a percentage of the flash to 0x00000000.

So a very simple example to get you started.

Bootstrap, flash.s

.thumb

.thumb_func
.global _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop

.thumb_func
reset:
    bl notmain
    b loop
.thumb_func
loop:   b .

.thumb_func
.globl bounce
bounce:
    bx lr

Main/entry C code notmain.c

extern void bounce ( unsigned int );
unsigned int x;
void notmain ( void )
{   
    unsigned int ra;
    
    x=5;
    for(ra=0;;ra++) bounce(ra);
}

Some tools including the gnu tools will see the main() keyword and add stuff to the binary, that sometimes we do not want. Some toolchains are worse than others. This is a trivial solution to defeat that. YMMV.

linker script flash.ld

MEMORY
{
    rom : ORIGIN = 0x08000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
    .bss    : { *(.bss*)    } > ram
}

While the command line -Ttext= and -Tdata= and such are present in the linker, I would advise against using them except for the occasional Stack Overflow answer when being lazy. There are some unpleasant issues with the command line options including one very long-standing bug that they still have not fixed (the tools are magically kludged to work with the result of the bug so nobody cares I guess).

build

arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o notmain.o -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy -O binary notmain.elf notmain.bin

Now my code is designed to not care about things like arm-none-eabi- vs arm-linux-gnueabi-, the last 10 or 15 years of arm-whatever-whatever should work. I am simply using the compiler as a compiler and the linker as a linker. As close to zero toolchain specific magic as possible.

Always check the output to see that if nothing else the vector table looks good notmain.list (I used the disassembler so for example the tool tries to disassemble the vectors, just ignore the disassembly there).

notmain.elf:     file format elf32-littlearm


Disassembly of section .text:

08000000 <_start>:
 8000000:   20001000    andcs   r1, r0, r0
 8000004:   08000011    stmdaeq r0, {r0, r4}
 8000008:   08000017    stmdaeq r0, {r0, r1, r2, r4}
 800000c:   08000017    stmdaeq r0, {r0, r1, r2, r4}

08000010 <reset>:
 8000010:   f000 f804   bl  800001c <notmain>
 8000014:   e7ff        b.n 8000016 <loop>

08000016 <loop>:
 8000016:   e7fe        b.n 8000016 <loop>

08000018 <bounce>:
 8000018:   4770        bx  lr
    ...

0800001c <notmain>:
 800001c:   2205        movs    r2, #5
 800001e:   b510        push    {r4, lr}
 8000020:   2400        movs    r4, #0
 8000022:   4b03        ldr r3, [pc, #12]   ; (8000030 <notmain+0x14>)
 8000024:   601a        str r2, [r3, #0]
 8000026:   0020        movs    r0, r4
 8000028:   f7ff fff6   bl  8000018 <bounce>
 800002c:   3401        adds    r4, #1
 800002e:   e7fa        b.n 8000026 <notmain+0xa>
 8000030:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <x>:
20000000:   00000000    andeq   r0, r0, r0

The address space is right 0x08000000

08000000 <_start>:
 8000000:   20001000
 8000004:   08000011
 8000008:   08000017
 800000c:   08000017

The addresses for the vectors are address ORRED with 1 and that is correct. Now not all STM32's have 0x1000 bytes of ram so sometimes you make that smaller.

The minimal-ish linker script and the minimal bootstrap only work if you do not rely on .data being initialized and .bss being zeroed.

08000010 <reset>:
 8000010:   f000 f804   bl  800001c <notmain>
 8000014:   e7ff        b.n 8000016 <loop>

For this example it works great. You can grossly overcomplicate your linker script all you want just like everyone else does. Or you can keep it simple. Being bare-metal you are already going to want to sacrifice the C library, so what if you have to init your global variables runtime? Should never read memory before writing it so .bss doesn't need to be zero either.

Can use this solution for trivial delays for your first blink the led program (the hello world of bare-metal).

for(ra=0;;ra++) bounce(ra);

Because bounce is not in the optimization domain of notmain.c the compiler won't try to remove it as dead code as

for(ra=0;ra<100000;ra++) continue;

would if trying to do a delay that way. After your initial blink the led then start using timers in a POLLING fashion, do not go near interrupts until well versed in the processor and chip and peripherals and have figured out how the interrupts work in a polling fashion for that chip. If you do not then we will see you here at SO again with questions/problems.

The CMSIS headers are quite ugly if you actually look at them, and CMSIS is on part with unified syntax as another one of ARM's great ideas gone bad. Take your chances, in the same way that you take your chances using their libraries. (just look at the source code and you will see what I mean).

The NUCLEO boards are a good starting point as all you have to do to "program" the flash in the target mcu is copy the notmain.bin binary over. cp notmain.bin /media/user/something and the debug mcu that is creating the virtual thumb drive that you see mount when you plug the board in, takes the file and over SWD programs the target mcu and releases reset on it for you.

You are free to then go and use say openocd and flash write_image erase notmain.elf to write the flash in the target mcu, or GDB if you dare, etc. But a simple drag and drop or command line copy will work.

I have found that over the course of a coding session sometimes the write will not work, device is full or some such thing. Unplug and replug the NUCLEO board and that will fix it.

It is hard to find these days so look for stsw-link007 from st, one version will do it pulls its own updates from ST. For every new NUCLEO board I do a firmware update on the board, as I run Linux and there were combinations of NUCLEO debugger firmware and Linux that made the mounting of the virtual drive less reliable, you might get shot at programming then have to power cycle to try again. It is a JAVA program so it works on Windows or Linux or Mac.

Try to blink an led first and go from there, need to enable clocks for the particular gpio block (GPIOA, B, C, whatever is noted in the NUCLEO boards documentation), then make that pin an output and use BSRR to set/reset with a delay between each. Get past that ane you are on your way.

Most examples you will find are either library based and there is more than one flavor of library from ST. There are some cases where a board will work in the Arduino world, and the nucleos likely fit into the mbed world so you can play in that sandbox if you don't want to do stuff like the above. But for the most part the perpherals are quite easy to use, the documentation from ST is on the better side, not the best not the worst, but better that most. Same goes with ARMs docs. Periodically try them all so that you know what is out there and then periodically pick the one you want to do your projects in. These things evolve (well not rolling your own that is always a known quantity), so see what they have to offer.

The gnu tools have been more stable than the llvm tools, but both work equally well, gnu's compilers have gotten worse over time from around 4.x.x to the present if even 3.x.x. llvm you would think would do better but remains on par or sometimes makes slower code, sadly. The command line options for clang and the other tools change just about every major version so it is a nightmare to try to maintain your makefiles using the llvm tools so I have abandoned support for them many times.

I now build llvm for the target gnu style so that I don't need those command line options and that has been doing better, gcc has been more and more sloppy the last few major revisions with leaving instructions in there that would have normally been optimized out. Both work. Note it is not unusual to use clang to compile but binutils to assemble and link, if you build llvm for the target (armv6m for example) then you can build a cross linker as well and use it. Sadly they do not have an assembler so you have to use the compiler as an assembler.

The virtual drive on the NUCLEO will put a FAIL.TXT file with not any useful information in it if the binary is not programmed into the target mcu. Things like using 0x00000000 for the vector table will do that. Among other things, you just have to debug your code/build. Also when examining the list file, if you are using a cortex-m0 or -m0+ or the basic flavors of cortex-m8 (ones that don't support the lions share of the thumb2 extensions) then you will fault. So be careful when taking a project from a cortex-m4 or -m7 to a cortex-m0 as a skeleton starting point to remember to target for the m0 or always target for the m0 in general then take advantage of the additional thumb2 extensions if you have specific performance problems with the code.