Search code examples
coptimizationusbstack-overflowstm32

STM32 VCP driver - pointer becomes invalid only with optimization


I am working on an embedded project with the STM32F405 microcontroller, and have some really confusing behavior. I am porting an existing (working) project from the STM32F1 to the STM32F4, but I have added ST's standard peripheral library USB stack for VCP.

If I compile the program with -O0 optimization, then it runs as expected indefinitely. However if I compile with -O2, then the project will run for 10-15 minutes but then I'll get what looks like to me a stack overflow happening in ST's VCP driver code.

The actual bug manifests as a pointer (GREGS) becoming invalid even though the pointer had been used earlier in the same function. This pointer is to the hardware interrupt configuration registers for the USB peripheral, so the actual data didn't disappear, however when the pointer is accessed, I get a fault, and I can see with my debugger that the pointer is invalid. (I've copied the actual function from usb_dcd_int.c below, with the troublesome lines pointed out.)

static uint32_t DCD_HandleRxStatusQueueLevel_ISR(USB_OTG_CORE_HANDLE *pdev)
{
    USB_OTG_GINTMSK_TypeDef  int_mask;
    USB_OTG_DRXSTS_TypeDef   status;
    USB_OTG_EP *ep;

    /* Disable the Rx Status Queue Level interrupt */
    int_mask.d32 = 0;
    int_mask.b.rxstsqlvl = 1;
    /*****************************************************************/
    /*********** POINTER IS READ HERE - NO PROBLEMS ******************/
    /*****************************************************************/
    USB_OTG_MODIFY_REG32( &pdev->regs.GREGS->GINTMSK, int_mask.d32, 0);

    /* Get the Status from the top of the FIFO */
    status.d32 = USB_OTG_READ_REG32( &pdev->regs.GREGS->GRXSTSP );

    ep = &pdev->dev.out_ep[status.b.epnum];

    switch (status.b.pktsts)
    {
    case STS_GOUT_NAK:
        break;
    case STS_DATA_UPDT:
        if (status.b.bcnt)
        {
          USB_OTG_ReadPacket(pdev,ep->xfer_buff, status.b.bcnt);
          ep->xfer_buff += status.b.bcnt;
          ep->xfer_count += status.b.bcnt;
        }
        break;
    case STS_XFER_COMP:
        break;
    case STS_SETUP_COMP:
        break;
    case STS_SETUP_UPDT:
        /* Copy the setup packet received in FIFO into the setup buffer in RAM */
        USB_OTG_ReadPacket(pdev , pdev->dev.setup_packet, 8);
        ep->xfer_count += status.b.bcnt;
        break;
    default:
        break;
    }

    /* Enable the Rx Status Queue Level interrupt */
    /*****************************************************************/
    /************************* GREGS == :-(   ************************/
    /*****************************************************************/
    USB_OTG_MODIFY_REG32( &pdev->regs.GREGS->GINTMSK, 0, int_mask.d32);

    return 1;
}

I am using vanilla GNU make and gcc-arm-none-eabi 5-4-2016q3 as my toolcahin, ST's vanilla linker and startup script from 2015 for the STM32F405, and the VCP code came from March 2012. I'm a novice to startup and linker scripts, but I can't see anything suspicious in either one. I also don't see anything glaring in ST's VCP code, but I certainly don't understand every line.

I have three questions:

  1. Does this sound like a stack overflow?
  2. How is stack for an IRQ allocated? In ST's implementation, the VCP interrupt has a really deep call tree. Do I just need to allocate more for the VCP IRQ?
  3. Which optimizations in -O2 could cause this behavior? I'm wondering if I could selectively disable some optimizations that might help me track down my bug.

Solution

    1. If there is a difference without optimization and with optimization, than this sounds related to optimization+volatile problem. Good think is to understand volatile type qualifier and how it is related to C optimization. There are many good articles on the web.
    2. Stack just exists, you can't "allocate" more stack (at least not in a sense on an embedded system). The function can "allocate" stack, in the way, it uses stack for storing local variables, register states and moves the stack pointer. When the IRQ occurs, the current execution state is saved on top of the stack and then the IRQ handler function is executed. You may detect if stack overflow occurs by setting a breakpoint on access to memory address that is the end of the stack. On STM32 the stack pointer decreases when you put something on stack. You can inspect the stack usage of functions with -fstack-usage. But this isn't related to the problem. Compiling with better optimization creates code with smaller stack usage.
    3. I guess all/any.

    Now I don't know what do you mean by The actual bug manifests as a pointer (GREGS) getting dereferenced even though the pointer had been used earlier in the same function.. The programmers intention was to dereference GINTMSK twice. GINTMSK pointer is declared volatile, it will be derefenced every time it is used and not optimized away. That is also the intention, as GINTMSK is a hardware mapped register variable.
    From your description it looks like the value of pdev->regs.GREG is somewhere modified in that switch inbetween.
    USB_OTG_ReadPacket() looks fairly simple, but maybe the buffer points in the wrong location and it overwrites the pdev structure?
    Maybe during this interrupt other interrupt with different priority fires up and modifies pdev structure. Try adding __disable_irq() and __enable_irq() guards.
    If you are porting this project, you can consider moving to STM32 HAL library and use STM32CubeMX program to generate some of the code. STM32 libraries get better with each version and some of the oldest had problems with various optimizations.