Search code examples
linuxgdbembeddedpreempt-rt

Investigating thread stack overflow


I am experiencing segmentation fault when running my multi-threaded embedded application. GDB gave me a hint that the stack could be corrupt which lead me to believe the stack is too small for the problematic thread. Increasing the stack size seem to remove the issue but I would like to confirm it a bit further. What are my options here? Is it possible to find out the current stack size at the event of segfault?


Solution

  • In gcc compile with -fstack-usage. That will cause the a .su file to be output for each object file which contains plain text stack report for each function. LIke:

    main.c:36:6:bar    48    static
    main.c:41:5:foo    88    static
    main.c:47:5:main    8    static
    

    However that reports only the stack frame for the function, the stack usage is the sum of the stack frames for each function called from that function. Working that out for all possible call paths to determine the worst case stack depth for any non-trivial application is not practical - you need a look that can inspect the call-graph and use the .su data to work that out for you. Here is an example of a perl script to combine the output of objdump and the .su files to generate a full stack report like:

      Func                               Cost    Frame   Height
    ------------------------------------------------------------------------
    > main                                176       12        4
      foo                                 164       92        3
      bar                                  72       52        2
    > INTERRUPT                            28        0        2
      __vector_I2C1                        28       28        1
      foobar                               20       20        1
    R recursiveFunct                       20       20        1
      __vector_UART0                       12       12        1
    
    Peak execution estimate (main + worst-case IV):
      main = 176, worst IV = 28, total = 204
    

    The stack usage for your thread will be the stack usage of its entry-point/thread function, plus perhaps some margin for whatever thread overhead the OS may require.

    Note that calls through function-pointers and recursion will defeat this method, so you may need to assess that separately by considering the stack usage of the functions called and the depth of recursion likely.

    The answer to How to determine maximum stack usage in embedded system with gcc? may also be useful.

    To help detect stack issues at runtime there are various instrumentation options related to stack checking and protection at https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Instrumentation-Options.html#Instrumentation-Options

    Knowing the"current stack size" at the point you get a seg fault is not particularly helpful. It won't tell you tell you how much stack is needed, it will just tell you how far out of bound it happened to be at the point the MMU trapped the fault, which is likely to be as soon as it accesses outside of the allocated stack space, within the resolution of the page size. It just tells you your stack is not big enough - which you kind of knew already.

    A "dynamic" technique for stack analysis is to "oversize" the stack, fill it with single byte value, then after running the code through a test sequence designed to exercise all likely call paths you inspect the stack region to see where the "high-tide mark" is relative to the start of the region, then "right-size" your stack accordingly. That is a common technique but depends on exercising all likely paths. Typically error and exception handling paths are omitted, so you can end up getting a stack overflow, just when your code is trying to handle some other error - its risky.