Search code examples
gdbstackprofilingvalgrindbenchmarking

Measure the peak stackpointer value and its PC location


For an analysis of different binaries, I need to measure the peak actual stack memory usage (not just the stack pages reserved, but the memory actually used). I was trying the following with gdb

watch $sp
commands
silent
if $sp < $spnow
  set $spnow=$sp
  set $pcnow=$pc
  print $spnow
  print $pcnow
  end
c

This appears to "work" when applied to ls, except even for a short-running program as ls, it doesn't actually appear to progress, but it's stuck in functions like "in strcoll_l () from /usr/lib/libc.so.6". Probably it just is too slow with this methodology.

I also looked into the valgrind massif tool. It can profile stack usage, but unfortunately can't seem to report in what part of the program the peak usage was encountered.


Solution

  • For an analysis of different binaries, I need to measure the peak actual stack memory usage

    Your GDB approach

    • works only for single-threaded programs
    • is too slow to be practical (the watch $sp command forces GDB to single-step your program).

    If you only care about stack usage at page granularity (and I think you should -- does it really matter whether the program used 1024 or 2000 bytes of stack?), then a much faster approach is to run the program in a loop, reducing its ulimit -s while the program successfully runs (you could also binary search, e.g. start with default 8MB, then try 4, 2, 1, 512K, etc. until it fails, then increase stack limit to find the exact value).

    For /bin/ls:

    bash -c 'x=4096; while /bin/ls > /dev/null; do
             echo $x; x=$(($x/2)); ulimit -s $x || break; done'
    4096
    2048
    1024
    512
    256
    128
    64
    32
    bash: line 1: 109951 Segmentation fault      (core dumped) /bin/ls > /dev/null
    

    You can then find the $PC by looking at the core dump.

    I need the precise limits because I want to figure out what compiler optimizations cause what micro-changes to stack usages (even in the bytes range. along with .data and .text sizes).

    I believe it's a fool's errand to attempt that.

    In my experience, stack use is most affected by compiler inlining decisions. These in turn are most affected by precise compiler version and tuning, presence of runtime information (for profile-guided optimization), and precise source of the program being optimized.

    A yes/no change to inlining decision can increase stack use by 100s of KBs in recursive programs, and minuscule changes to any of the above factors can change that decision.