How can I debug "undefined instruction" faults for an LPC1788 in IAR Embedded Workbench?

I'm developing an application for an LPC1788 (Cortex-M3) microcontroller. This application involves sending and receiving CAN messages, and I find that when I subject it to heavy load and twiddle my thumbs for 30-60 minutes, a hardfault will occur.

I'm debugging the firmware through IAR Embedded Workbench 6.60.1.5104, and when this hardfault occurs it breaks at a weakly-linked default hardfault handler defined in startup_LPC177x_8x.s:

        PUBWEAK HardFault_Handler
        SECTION .text:CODE:REORDER(1)
HardFault_Handler
        B HardFault_Handler

Unfortunately, the call stack only contains the address of this handler and not whichever part of code called into it (where the error occurred).

The only useful information I've been able to gather is from the UNDEFINSTR bit being set in the NVIC:CFSR register. From the documentation:

When this bit is set, the PC value stacked for the exception return points to the undefined instruction. An undefined instruction is an instruction that the processor cannot decode. Potential reasons:

a) Use of instructions not supported in the Cortex-M device.
b) Bad or corrupted memory contents.

I've read that the value of the program counter at the point the instruction was executed is stored in one of the exception stack registers, but I'm not sure how to access these from IAR.

In case it's any help, I've included a screenshot containing some of the debugging details (right click -> view image for a larger version):

Solution

I found a solution by adapting code I found from this blog.

The user "Ramon" there helpfully posted code that led me on the right track to getting something that compiled in IAR (I've never tried writing raw assembly in IAR Embedded Workbench before).

This is the hardfault handling code I used:

#include "lpc177x_8x.h"   

static volatile unsigned long stacked_r0 = 0;
static volatile unsigned long stacked_r1 = 0;
static volatile unsigned long stacked_r2 = 0;
static volatile unsigned long stacked_r3 = 0;
static volatile unsigned long stacked_r12 = 0;
static volatile unsigned long stacked_lr = 0;
static volatile unsigned long stacked_pc = 0;
static volatile unsigned long stacked_psr = 0;
static volatile unsigned long _cfsr = 0;
static volatile unsigned long _hfsr = 0;
static volatile unsigned long _dfsr = 0;
static volatile unsigned long _afsr = 0;
static volatile unsigned long _bfar = 0;
static volatile unsigned long _mmar = 0;

void hardfault_handler( void )
{
  __asm("tst lr, #4");
  __asm("ite eq \n"
        "mrseq r0, msp \n"
        "mrsne r0, psp");
  __asm("b hard_fault_handler_c");
}

void hard_fault_handler_c(unsigned long *hardfault_args){
  stacked_r0 = ((unsigned long)hardfault_args[0]);
  stacked_r1 = ((unsigned long)hardfault_args[1]);
  stacked_r2 = ((unsigned long)hardfault_args[2]);
  stacked_r3 = ((unsigned long)hardfault_args[3]);
  stacked_r12 = ((unsigned long)hardfault_args[4]);
  stacked_lr = ((unsigned long)hardfault_args[5]);
  stacked_pc = ((unsigned long)hardfault_args[6]);
  stacked_psr = ((unsigned long)hardfault_args[7]);

  // configurable fault status register
  // consists of mmsr, bfsr and ufsr
  _cfsr = (*((volatile unsigned long *)(0xe000ed28)));

  // hard fault status register
  _hfsr = (*((volatile unsigned long *)(0xe000ed2c)));

  // debug fault status register
  _dfsr = (*((volatile unsigned long *)(0xe000ed30)));

  // auxiliary fault status register
  _afsr = (*((volatile unsigned long *)(0xe000ed3c)));

  // read the fault address registers. these may not contain valid values.
  // check bfarvalid/mmarvalid to see if they are valid values
  // memmanage fault address register
  _mmar = (*((volatile unsigned long *)(0xe000ed34)));
  // bus fault address register
  _bfar = (*((volatile unsigned long *)(0xe000ed38)));

  __asm("bkpt #0\n"); // break into the debugger
}

To test this, I created the following function to ensure that "division by zero" usage errors would be captured, but that usage errors would automatically be elevated to hard faults:

void configureFaultHandling()
{
  // Catch all possible faults.

//  SCB->SHCSR |= SCB_SHCSR_MEMFAULTENA_Msk
//    | SCB_SHCSR_BUSFAULTENA_Msk
//    | SCB_SHCSR_USGFAULTENA_Msk;

  SCB->CCR |= SCB_CCR_DIV_0_TRP_Msk
    | SCB_CCR_UNALIGN_TRP_Msk;
}

I added the following to the start of main:

void main()
{
  configureFaultHandling();

  int a = 5 / 0;
}

Running this, the program very quickly braked at void hardfault_handler( void ), and I could step through this into hard_fault_handler_c and inspect the values of the registers.

What I found is that these values corresponded to the values shown in IAR's View -> Stack -> Stack 1 pane. This makes sense in hindsight as the documentation states that the values of certain registers get pushed onto the stack when faults occur. However, writing this function helped me figure out which values in the stack corresponded to which registers.

For reference to myself and others that might have similar issues, I found the 7th value in "stack 1" (i.e. index 6) corresponded to the program counter value at the time the exception occurred. This is what it looked like for me (right click -> view image to enlarge):

Doing it this way allows you to find the source of hardfaults without needing to overwrite the default hardfault handler, so long as the debugger automatically breaks when hardfaults occur.

Hopefully this will also help in pinning down the "undefined instruction" faults.