Search code examples
corruptionfreertosheap-corruptionmemory-corruption

Weird memory corruption issue, FreeRTOS, STM32F777II


I am currently working on an embedded firmware development which uses FreeRTOS running on an STM32F777II microcontroller. Resource wise, I have around 10 tasks (total sum of stack size will be under 40 KByte) at the same priority, around 4 queues of 1KByte each, 4 binary semaphores. I know this would be an incomplete question without posting the actual code, but I really do not have any specific portion in my firmware that I think will be worth sharing related to my issue. I have a ton of business logic in my code which I cannot fully share as well.

I have a struct which consists of multiple char and int arrays of a specific length. 4 of the tasks uses these structures each. Each structure consumes around 15KByte of space and is defined in the global space of the FreeRTOS environment, not local to a task. The structs are allocated statically only and not dynamically on runtime. And since I initialize few members of the struct when declaring, so they go to the .data section only if I am not mistaken. Until now, there had been absolutely no problem whatsoever in my code and it worked 100% without any issue at all. Now I recently had a requirement where I had to add the same stuct to 2 more tasks. So, I added this 15KByte stuct to one of my tasks, basically just allocated and initialized and did not do any processing in any of the tasks. Observed no problems, nothing, no data corruption, nothing. Now when I allocated one more struct variable of the same type only, what I observe is data corruption in a lot of other places in my project. Some of the queues stopped working correctly and showed garbage data when read. Some of the other buffers also showed data corruption. I am really not sure why just one more variable allocation of this struct is triggering a lot of data corruption at other places in my project. If I remove this one allocation, everything goes back to normal. My MCU has 512KB of RAM and as per the IDE's build analyzer feature, it showed below 40% RAM usage, so what is triggering this issue, any suggestions to try? Could be because of some overlapping of .data or .bss sections or something? I did not observe any stack overflows or hard faults in the system during this.


Solution

  • For a quick resolution,

    I randomly just disabled the D-cache by commenting out the function:

    SCB_EnableDCache();
    

    and voila, everything started to function correctly as it should without any instances of data corruption.

    For long run and correct resolution:

    Looks like there are some latent issues with my coding. I need to review the memory use, and regions of memory with different properties. Look at the buses, review any DMA usage, and MPU memory settings. Also, review the correct usage of volatile memory directives, thread-safe operation, and cache-coherency issues. Also, Use memory fencing and cache flushing as appropriate.

    More details: Level 1 cache on STM32F7 Series and STM32H7 Series