Search code examples
cdebugginggdbmicrocontrolleropenocd

Executing arbitrary code on-the-fly via gdb on microcontroller target?


Let me try to explain what I'm looking for, as I couldn't find a better wording for the title.

Let's say I'm programming an RP2040 microcontroller, and I can establish a debugging session with it using gdb and openocd. (Note that even if I'm discussing through a concrete MCU platform here, I'm interested in whether this approach is achievable in general - with any sort of "external micrcontroller", that gdb might be able to target)

Now let's say I want to do some (relatively simple) process with external hardware: for the sake of example, let's say I want to turn some GPIO pin on, wait for 2000 CPU cycles, and then set the same GPIO off. Even with such a simple example, this requires hardware initialization, so in all, in firmware code I'd have to do something like (C using pico-sdk):

#define MY_PIN_NR 12
static inline void my_hardware_init(void) {
  gpio_init(MY_PIN_NR);
  gpio_set_dir(MY_PIN_NR, GPIO_OUT);
}

static inline void my_hardware_do_process(void) {
  // raise pin high:
  gpio_put(MY_PIN_NR, 1);
  // wait for 2000 CPU cycles
  uint16_t cycles_to_wait = 2000;
  while(cycles_to_wait--) {
    asm volatile("nop");
  }
  // set pin low:
  gpio_put(MY_PIN_NR, 0);
}

void my_hardware_full_process(void) {
  // ensure hardware is initialized
  my_hardware_init();
  // do process:
  my_hardware_do_process();
}

If this is compiled in firmware and burned in Flash, I can call it directly on the target microcontroller in a GDB session with, say:

(gdb) call my_hardware_full_process()

(or even just p my_hardware_full_process()); then even if the debugger has the microcontroller halted on a breakpoint, the function still executes, and then returns back to the debugger.

Now, this implies that there is actual code burned on the Flash (starting at the address that gdb resolves as the location of the symbol my_hardware_full_process).

So, my question is - can I somehow do something similar, that is, perform the execution of the same code as in my_hardware_full_process, but if the microcontroller Flash is fully erased/uninitialized? (which means that the microcontroller has no code to run, and therefore does not run any code - note gdb via openocd can still hook into this state). In this case, even if gdb gets an address of my_hardware_full_process from the .elf file, it will still be an address that does not contain runnable code, so the approach with (gdb) call function-symbol() fails.

Thinking about this, I was speculating, maybe it is possible to compile a "binary blob", that would contain the assembly for my_hardware_full_process() function - for instance, arm-none-eabi-objdump -S --disassemble=my_hardware_full_process firmware.elf here would give:

Disassembly of section .text:

10000310 <my_hardware_full_process>:
  }
  // set pin low:
  gpio_put(MY_PIN_NR, 0);
}

void my_hardware_full_process(void) {
10000310:       b510            push    {r4, lr}
  gpio_init(MY_PIN_NR);
10000312:       200c            movs    r0, #12
10000314:       f003 fcf2       bl      10003cfc <gpio_init>
 * Switch all GPIOs in "mask" to output
 *
 * \param mask Bitmask of GPIO to set to output, as bits 0-29
 */
static inline void gpio_set_dir_out_masked(uint32_t mask) {
    sio_hw->gpio_oe_set = mask;
10000318:       23d0            movs    r3, #208        ; 0xd0
1000031a:       061b            lsls    r3, r3, #24
1000031c:       2280            movs    r2, #128        ; 0x80
1000031e:       0152            lsls    r2, r2, #5
10000320:       625a            str     r2, [r3, #36]   ; 0x24
    sio_hw->gpio_set = mask;
10000322:       615a            str     r2, [r3, #20]
  uint16_t cycles_to_wait = 2000;
10000324:       22fa            movs    r2, #250        ; 0xfa
10000326:       00d2            lsls    r2, r2, #3
  while(cycles_to_wait--) {
10000328:       e001            b.n     1000032e <my_hardware_full_process+0x1e>
    asm volatile("nop");
1000032a:       46c0            nop                     ; (mov r8, r8)
  while(cycles_to_wait--) {
1000032c:       001a            movs    r2, r3
1000032e:       1e53            subs    r3, r2, #1
10000330:       b29b            uxth    r3, r3
10000332:       2a00            cmp     r2, #0
10000334:       d1f9            bne.n   1000032a <my_hardware_full_process+0x1a>
    sio_hw->gpio_clr = mask;
10000336:       23d0            movs    r3, #208        ; 0xd0
10000338:       061b            lsls    r3, r3, #24
1000033a:       2280            movs    r2, #128        ; 0x80
1000033c:       0152            lsls    r2, r2, #5
1000033e:       619a            str     r2, [r3, #24]
  // ensure hardware is initialized
  my_hardware_init();
  // do process:
  my_hardware_do_process();
}
10000340:       bd10            pop     {r4, pc}

Disassembly of section .data:

So, basically, I'd need this code, plus wherever <gpio_init> and dependencies jump to - in essence, a "static build", as known on PCs. In principle, I can imagine a "static build" blob that "includes" all the requirements/dependencies required to run (in this case) the my_hardware_full_process function.

The question then, becomes: can I somehow use gdb to read this kind of a "static build binary blob" file on the PC, and then somehow "push" the instructions and their data to the microcontroller, and have the blob's instructions executed there (that is, "on-the-fly"), so the hardware performs the expected function (after which, control is returned to gdb prompt) -- even if Flash memory is fully erased?

If so, how could I create a such a "static build binary blob" - and how could I instruct gdb to run it on the target microcontroller?


Solution

  • In general, of course. That's what you do with a debugger. In general... there is no need to run it from flash. MCUs have ram, the pico (chip) in particular does not have flash, just ram. (the flash is off chip and on the board). You would build the binary like any other download it using gdb if you wish and run and stop it.

    Now C functions do not run on their own, and library calls even if you think they are built statically, are not expected to run as a function in isolation. C has to be bootstrapped. stack, .data, .bss, etc. And these vendors libraries tend to have extra stuff that is loaded into linker script and bootstrap behind the scenes for the library calls to work. Like trying to drive the car without starting it, leaning on the steering wheel and pushing the gas pedal without starting the car will not get you anywhere (might crash if you let off the parking brake).

    You would need to design the isolated function for this use case and resolve the prerequisites.

    What you should be doing instead is just make a normal binary that does the minimal thing you want download and run that in ram. (build, as in link, it for ram not flash)

    Now with IoT problems the mcus are starting to get protection, I think the pico has, what is not necessarily protection but how it boots is a rom bootloader based thing geared toward trying to find a flash then parsing if you will a file system off of it that contains the binary to load into sram and run. I would have to consult my notes/examples and the docs to see if you can just release reset and load code into sram. Definitely though you can have a minimal program on flash that if nothing else gets through this boot process and leaves the program in an infinite loop which then from the debugger you can stop, load code into sram, and resume at an address.

    You have chosen one of the more difficult mcus to try this activity with, in a number of ways. The Nucleo boards have a debugger built in, and the larger ones (still for $10 or so) that debugger can be used for other cortex-m boards even for other brand chips. the stm32g parts with a g tend to have some protections possibly making it harder to run just on ram. the stm32f and stm32l and older parts not a problem at all. you can get a blue pill and then for $5 or so get a debugger board (jlink clone). that brand and the other leading cortex-m brands have better documentation than the rp2040. the Broadcom chip has some very cool features, but if you want to make some clean code that doesn't rely on libraries then experience and digging and hacking at it are required.

    flashing originally was only from intel as a part. And essentially it was mapped into the address space of the processor. IF you had some sort of debug interface, runtime, a debugger (openocd, etc) only needed to know the base address and the protocol was the same it was all intel flash parts. But we have spi and i2c parts for isolated parts and inside the mcu it is the chip vendors own interface, so no longer can we have whatever processor of any form and give me the address and I can program the flash, now we have, every sub family or individual part within a product line programs different from any other part (some overlap but less than you would like) now the debugger would have to know every one of the zillions of combinations. and those folks do not care. So if someone from a specific company choses to contribute to an open source project like openocd to add support for their specific flash controller for a specific set of products, that happens. the stm32f103 in the blue pill for example is supported if I remember right. dfu to some extent has helped but there is a bootloader that needs to be running on the part to convert the generic dfu commands to chip specific routines.

    Then you have the problem of a lot of the mcus have one flash bank so you cannot execute from it while erasing/programming it even if it is a page you are not using. You generally have to copy and jump to ram and interface with the debugger or whatever so the flash can be programmed. (that or just stop the processor and control it from the debugger, which might be your use case). some now have multiple banks and advertise flashing while running. the rp2040 does not have a flash at all, the board vendor chooses one and populates it and you can see in the on chip bootloader source code the gyrations to try to figure out what part is out there.

    So I am not sure if you were asking the flash question because you wanted to essentially see if you could load and run from ram, or if you wanted to know if you could program a small blob onto the flash, or if you had to write a small blob to the flash. you certainly can as you have stopped the processor but why? for this use case if I understand it use ram if you can. and with most products, particularly those based around a cortex-m, you can.

    As implied above I do not think that is the real problem here the real problem is assuming a function can execute stand-alone, even main() cannot execute stand-alone, in general, so that is what you need to focus on.

    As far as confirming running from ram just take a simple program

    here:
    add r0,r0,#1
    b here
    

    build it, load it to some address (it is position independent so it only needs to be on an even address) in ram, start it, wait, stop it, then use the debugger to read register 0, resume, stop, read register 0. see that it is changing each time you stop and look.

    Making a usable blob is a whole other set of SO questions and/or just understand you should just make a small, complete, program that does the minimal thing you want to do. and load that, run it and, stop. Granted, unfortunately if you use the vendor libraries each binary is going to reset/wipeout the settings from a prior program (one program to enable the gpio output, and another to blink it without enabling it, is not likely to work within the vendors sandbox). So you may need to roll your own. As mentioned above you need to design the "function" or really the whole blob to be stand-alone (read: not use someone else's library)


    So I highly recommend the picoprobe path, two Picos. You do not need the uart initially, so from the diagram the top two pins on the left, probe, to the lower pins on the target pico. The swd signals. Now I had issues until I actually powered the target from the probe, so the two power pins on the right of the probe to the two power pins on the right of the target.

    I downloaded flash_nuke.uf2 and used it on the target mcu to erase the flash.

    I simply downloaded the picoprobe.uf2 file, I followed the instructions to clone the openocd for the pico and built that.

    Then cd to the tcl directory and

    sudo ../src/openocd -f interface/picoprobe.cfg -f target/rp2040.cfg
    
    
    Open On-Chip Debugger 0.11.0-g8e3c38f-dirty (2023-04-16-22:44)
    Licensed under GNU GPL v2
    For bug reports, read
        http://openocd.org/doc/doxygen/bugs.html
    Info : only one transport option; autoselect 'swd'
    adapter speed: 5000 kHz
    
    Info : Hardware thread awareness created
    Info : Hardware thread awareness created
    Info : RP2040 Flash Bank Command
    Info : Listening on port 6666 for tcl connections
    Info : Listening on port 4444 for telnet connections
    Info : clock speed 5000 kHz
    Info : SWD DPIDR 0x0bc12477
    Info : SWD DLPIDR 0x00000001
    Info : SWD DPIDR 0x0bc12477
    Info : SWD DLPIDR 0x10000001
    Info : rp2040.core0: hardware has 4 breakpoints, 2 watchpoints
    Info : rp2040.core1: hardware has 4 breakpoints, 2 watchpoints
    Info : starting gdb server for rp2040.core0 on 3333
    Info : Listening on port 3333 for gdb connections
    

    all good

    Then in another window

    telnet localhost 4444
    Trying 127.0.0.1...
    Connected to localhost.
    Escape character is '^]'.
    Open On-Chip Debugger
    > halt
    target halted due to debug-request, current mode: Thread 
    xPSR: 0x01000000 pc: 0x00000178 msp: 0x20041f00
    target halted due to debug-request, current mode: Thread 
    xPSR: 0x61000000 pc: 0x00001bd0 msp: 0x50100f4c
    > 
    

    telnet to the openocd server and halt the target.

    start.s

    .cpu cortex-m0
    .thumb
    
        mov r0,#0
    here:
        add r0,#1
        b here
    

    memmap.ld

    MEMORY
    {
        here : ORIGIN = 0x20000000, LENGTH = 0xFC
    }
    SECTIONS
    {
        .text   : { *(.text*)   } > here
    }
    

    build

    arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 start.s -o start.o
    arm-none-eabi-ld -nostdlib -nostartfiles -T memmap.ld start.o -o notmain.elf
    arm-none-eabi-objdump -D notmain.elf > notmain.list
    

    from telnet, after/while halted

    load_image /path/to/notmain.elf
    
    6 bytes written at address 0x20000000
    downloaded 6 bytes in 0.001275s (4.596 KiB/s)
    

    now resume and halt from the telnet session

    > resume 0x20000000
    > halt
    target halted due to debug-request, current mode: Thread 
    xPSR: 0x01000000 pc: 0x00000178 msp: 0x20041f00
    target halted due to debug-request, current mode: Thread 
    xPSR: 0x01000000 pc: 0x20000002 msp: 0x50100f4c
    > reg r0
    r0 (/32): 0x00bef3f5
    
    > resume
    > halt
    target halted due to debug-request, current mode: Thread 
    xPSR: 0x01000000 pc: 0x00000178 msp: 0x20041f00
    target halted due to debug-request, current mode: Thread 
    xPSR: 0x01000000 pc: 0x20000002 msp: 0x50100f4c
    > reg r0
    r0 (/32): 0x01b31ecd
    
    > 
    

    And we can see that r0 is incrementing and the pc is around where we would expect it. So it is running this program that was downloaded into a Pi that has an erased flash.


    start.s

    .cpu cortex-m0
    .thumb
    
        ldr r0,=0x20001000
        mov sp,r0
        bl notmain
        b .
    
    .thumb_func
    .globl PUT32
    PUT32:
        str r1,[r0]
        bx lr
    
    .thumb_func
    .globl GET32
    GET32:
        ldr r0,[r0]
        bx lr
    
    .globl DELAY
    .thumb_func
    DELAY:
        sub r0,#1
        bne DELAY
        bx lr
    

    notmain.c

    void PUT32 ( unsigned int, unsigned int );
    unsigned int GET32 ( unsigned int );
    void DELAY ( unsigned int );
    
    #define RESETS_BASE                 0x4000C000
    
    #define RESETS_RESET_RW             (RESETS_BASE+0x0+0x0000)
    #define RESETS_RESET_XOR            (RESETS_BASE+0x0+0x1000)
    #define RESETS_RESET_SET            (RESETS_BASE+0x0+0x2000)
    #define RESETS_RESET_CLR            (RESETS_BASE+0x0+0x3000)
    
    #define RESETS_RESET_DONE_RW        (RESETS_BASE+0x8+0x0000)
    #define RESETS_RESET_DONE_XOR       (RESETS_BASE+0x8+0x1000)
    #define RESETS_RESET_DONE_SET       (RESETS_BASE+0x8+0x2000)
    #define RESETS_RESET_DONE_CLR       (RESETS_BASE+0x8+0x3000)
    
    #define SIO_BASE                    0xD0000000
    
    #define SIO_GPIO_OUT_RW             (SIO_BASE+0x10)
    #define SIO_GPIO_OUT_SET            (SIO_BASE+0x14)
    #define SIO_GPIO_OUT_CLR            (SIO_BASE+0x18)
    #define SIO_GPIO_OUT_XOR            (SIO_BASE+0x1C)
    
    #define SIO_GPIO_OE_RW              (SIO_BASE+0x20)
    #define SIO_GPIO_OE_SET             (SIO_BASE+0x24)
    #define SIO_GPIO_OE_CLR             (SIO_BASE+0x28)
    #define SIO_GPIO_OE_XOR             (SIO_BASE+0x2C)
    
    #define IO_BANK0_BASE               0x40014000
    
    #define IO_BANK0_GPIO25_STATUS_RW   (IO_BANK0_BASE+0x0C8+0x0000)
    #define IO_BANK0_GPIO25_STATUS_XOR  (IO_BANK0_BASE+0x0C8+0x1000)
    #define IO_BANK0_GPIO25_STATUS_SET  (IO_BANK0_BASE+0x0C8+0x2000)
    #define IO_BANK0_GPIO25_STATUS_CLR  (IO_BANK0_BASE+0x0C8+0x3000)
    
    #define IO_BANK0_GPIO25_CTRL_RW     (IO_BANK0_BASE+0x0CC+0x0000)
    #define IO_BANK0_GPIO25_CTRL_XOR    (IO_BANK0_BASE+0x0CC+0x1000)
    #define IO_BANK0_GPIO25_CTRL_SET    (IO_BANK0_BASE+0x0CC+0x2000)
    #define IO_BANK0_GPIO25_CTRL_CLR    (IO_BANK0_BASE+0x0CC+0x3000)
    
    int notmain ( void )
    {
    
        PUT32(RESETS_RESET_CLR,1<<5); //IO_BANK0
        while(1)
        {
            if((GET32(RESETS_RESET_DONE_RW)&(1<<5))!=0) break;
        }
        PUT32(SIO_GPIO_OE_CLR,1<<25);
        PUT32(SIO_GPIO_OUT_CLR,1<<25);
        PUT32(IO_BANK0_GPIO25_CTRL_RW,5); //SIO
        PUT32(SIO_GPIO_OE_SET,1<<25);
        while(1)
        {
            PUT32(SIO_GPIO_OUT_XOR,1<<25);
            DELAY(0x1000000);
        }
        return(0);
    }
    

    memmap.ld

    MEMORY
    {
        stuff : ORIGIN = 0x20000000, LENGTH = 0xFC
    }
    SECTIONS
    {
        .text   : { *(.text*)   } > stuff
    }
    

    build it

    arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 start.s -o start.o
    arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
    arm-none-eabi-ld -nostdlib -nostartfiles -T memmap.ld start.o notmain.o -o notmain.elf
    arm-none-eabi-objdump -D notmain.elf > notmain.list
    

    now on the telnet prompt you can

    > reset halt
    target halted due to debug-request, current mode: Thread 
    xPSR: 0xf1000000 pc: 0x000000ee msp: 0x20041f00
    target halted due to debug-request, current mode: Thread 
    xPSR: 0xf1000000 pc: 0x000000ee msp: 0x20041f00
    > 
    

    then load_image this new notmain.elf

    resume 0x20000000

    and on the target pico the led will blink.

    Technically you should be able to use any swd debugger, but I forget if that failed. With picoprobe I had both boards powered by the same USB hub and it did not work was getting some dap error or something. With only the probe board plugged in the failure made it look like it found the probe but could not find the target. So looking at the target side, not sure if some forum or the documentation or what decided to try powering from the probe, and that worked.

    The example is one that can be built for flash or sram. For flash the first stage bootloader is on the part, the second stage comes from the 252 bytes on the first uf2 partition on the flash. So my first flash attempts I made this little blinker. I forget the gory details when you move to larger programs. There is a higher address sram that is part of the copy from flash stuff then 0x20000000 is a very typical address for sram. (cortex-m has address space rules for chip vendors 0x40000000 is where peripherals start, some will do 0x10000000 but most 0x20000000 and some will mirror that below 0x10000000 to meet some other rule, but you can execute from the 0x20000000 space)

    I have no use for gdb, so very near zero experience, been a decade or closer to two. I just telnet into openocd, which I do a lot and use load_image a lot, sometimes

    flash write_image erase /path/file.elf
    

    for parts that are supported then a reset on the command line or reset the board. I often will solder a reset button down on boards like this pico to not have to pull the USB out and plug it back in, but with openocd you can do a reset or reset halt if you want it to reset the part but not release the processor to execute (allowing you to download code into sram, then resume)

    Anyway, if you get this far then you can sort out the gdb way of loading and running. It is very much doable, no question whatsoever, I just have no reason to know and cannot help you there.


    From one of your comments and changes I made to get the above.

    When the flash is erased, it does appear that the first stage bootloader is running and getting to a point that it speeds up the clock on the processor. Based on the delay counts to get a visual blink that is not too fast nor too slow. But if you built a flash image with the program:

    b .
    

    and put that on the flash. Then used the probe and loaded the above blinker program it was much slower.

    I would highly recommend that path, as who knows what else the bootloader messed with, you want to develop for a clean, post-reset system, not a halted the bootloader that did stuff system.