Search code examples
cembeddedeepromnxp-microcontroller

FlexNVM Data Corruption When Power Cuts Off During Data Write


I am using the NXP S32K148 Microcontroller.

Imagine in the middle of writing data to the EEPROM (FlexNVM module), power suddenly cuts off exactly in the middle of data transmission.

Is there any way to detect the exact data that has been corrupted for this reason?

I am aware of BrownOut detection supported by the MCU, which lets me know if something goes wrong in the middle of a FlexNVM write.

I am also aware of using CRC or redundant bits to check on read, but I am looking for any lower-level action by the EEPROM emulator itself to recognize data corruption.


Solution

  • While writing data to the EEPROM, a power off or an unexpected reset could lead to ECC errors for that particular memory region. In S32K1xx there 2 types of ECC:

    • ECC for single-bit errors -> The MCU detects the fault (fault detection time) and then corrects it (within fault reaction time).
    • ECC for multiple-bit errors -> The MCU is managing this as non-correctable errors, so software actions are required.

    Let's go further having in mind the second type of ECC error (multiple bit), because, for the first one, the MCU will take care of it (of course even for this one you can set some registers and have an interrupt to let you know that MCU made some actions in this direction - but this is another topic)

    When the fault occurs, the FERSTAT[DFDIF] flag is set notifying that a double-bit fault was detected. The flash controller will generate a bus fault interrupt. After serving the bus fault, jump to the Flash Memory Module interrupt handler. The software can handle the error depending on whether the error occurred in Code Space or Data Space.

    On S32K1xx, the bus fault is disabled by default (and I can bet that no one will enable this). Therefore, it will be directly escalated to hard fault.

    To have the interrupt enabled to notify a multiple-bit error event the FERCNFG[DFDIE] bit shall be set to 1. By having this, the interrupt handler "HardFault_Handler()" will be automatically called due to this.

    HardFault handler - is used widely in S32K for multiple general errors: ECC, MPU, etc.

    By knowing this, in the HardFault_Handler() you need to check first from where you end up there. For the ECC error, as I said above, check the FERSTAT[DFDIF] flag. If this is set, you have one more step until to get the exact address of the data which have been corrupted.

    Knowing the address, I think you can perform some additional operations to replace the data with backup data, default data, or older versions of that data. This is another topic.

    Arm Cortext have a bunch of registers that may be useful in your case. I'm talking about S32_SCB registers.

    1. BusFault Status Register (BFSR) indicates the cause of a bus access fault. Flags:
    • BFARVALID = 1 -> holds a valid fault address;
    • PRECISERR = 1 -> a data bus error has occurred, and the PC value stacked for the exception returns points to the instruction that caused the fault. When the processor sets this bit to 1, it writes the faulting address to the BFAR.
    1. BusFault Address Register (BFAR) - contains the address of the location that generated a BusFault.

    Helpful links: