We have a product that uses an MSP430F5436A. It has been in production for several years. Over the past several months we have had reports from production that a few units are not shutting down completely, their power LED is staying on. The power LED is driven by an I2C GPIO expander, it's not connected directly to the MSP430.
When the user hits the power button the MSP430 is supposed to shut down a lot of peripherals, turn off the power LED, and enter LPM2. Periodically a timer interrupt wakes it up from LPM2 to check to see if the power button has been hit to turn the device back on and to do some other house keeping.
After the user hits the power button there is a delay before the device powers off which is driven via a timer ISR. I ripped that out and put in a delay loop. Within that delay loop I am rapidly toggling a different LED that is connected directly to the MSP430.
If I make the loop duration shorter than two seconds the device is able to get into LPM successfully and the LED stops toggling. If I set the loop longer than two seconds I can observe the lock up behavior I am seeing. Sometimes the LED is left in an on or off state, that is random.
I have attempted to disable all maskable interrupts prior to entering the loop. That had no effect. I also put code in the SYSNMI
and UNMI
vectors to detect if either of those were firing and neither is firing. Also, there is no RTOS, this is a cooperative multi-tasking system. So it seems to be locking up while executing the loop its self.
Eventually my device reboots from this lockup but it is due to a watchdog reset. I print the reset source at startup. The device also recovers if I pull out and reinsert the battery. However, the device stays in a locked up state if I pull the reset line low. I also have trouble JTAGGing firmware onto the device once it is locked up.
I've looked in the errata for the MSP430F5436A for mentions of lock up behavior and the only mention isn't relevant.
Unfortunately, due to the packing and construction of the device it is difficult to probe some of the nets, but not impossible.
I can connect to the device with a debugger, but the debugger (CCS 7 + MSP-FET430UIF) looses its connection with the device shortly after the device starts running. That is something that is always true and is a design issue. We have frequent timer interrupts occurring so this behavior may not be avoidable.
The power source for the device is stable and it is operating on my bench and not in a high EMI/EMF environment. However, when this loop occurs some peripherals and power supplies would have been shut down. So it is possible some sort of anomaly is occurring on one of the MSP430's power rails or a GPIO pin.
What could be causing this lock up behavior? Are there any persistent registers I can read on reset that would give me a clue, or are there any pins/nets I can probe that would give me a clue?
Update: I have probed VCC for the MSP430, rails that power peripherals the MSP430 is connected to, and the reset line. I see no changes on any of these at the time of the lock-up.
Edit: The following is the delay loop that I added
dprintf(PRINT_LOG, "Entering delay loop");
__disable_interrupt();
enable_backlight();
for(uint16_t j = 4u; 0u < j; --j)
{
__delay_cycles(14680064/20); // ~50mS delay w/ 14.680064 MHz MCLK
toggle_backlight();
}
disable_backlight();
__enable_interrupt();
dprintf(PRINT_LOG, "Exiting delay loop");
I incorrectly assumed that the hang was occurring within my delay loop because interrupts were disabled, and that the loop wasn't completing. It turns out the loop is running to completion, and after interrupts are enabled an ISR is entered for an event that occurred during the execution of the loop. The hang occurs in that ISR.
Shortly after the delay loop (in time, but somewhere else altogether in code) a GPIO interrupt is disabled. When I power down the device a peripheral that is holding a pin high is shut off, and that pin eventually drops. If I make the delay loop short the loop is exited and that particular interrupt is disabled before the pin drops. If the delay loop is long the pin drops and when the GIE bit is reset by __enable_interrupt();
a context switch to that GPIO interrupt begins.