For an analysis of different binaries, I need to measure the peak actual stack memory usage (not just the stack pages reserved, but the memory actually used). I was trying the following with gdb
watch $sp
commands
silent
if $sp < $spnow
set $spnow=$sp
set $pcnow=$pc
print $spnow
print $pcnow
end
c
This appears to "work" when applied to ls
, except even for a short-running program as ls
, it doesn't actually appear to progress, but it's stuck in functions like "in strcoll_l () from /usr/lib/libc.so.6". Probably it just is too slow with this methodology.
I also looked into the valgrind massif
tool. It can profile stack usage, but unfortunately can't seem to report in what part of the program the peak usage was encountered.
For an analysis of different binaries, I need to measure the peak actual stack memory usage
Your GDB approach
watch $sp
command forces GDB to single-step your program).If you only care about stack usage at page granularity (and I think you should -- does it really matter whether the program used 1024 or 2000 bytes of stack?), then a much faster approach is to run the program in a loop, reducing its ulimit -s
while the program successfully runs (you could also binary search, e.g. start with default 8MB, then try 4, 2, 1, 512K, etc. until it fails, then increase stack limit to find the exact value).
For /bin/ls
:
bash -c 'x=4096; while /bin/ls > /dev/null; do
echo $x; x=$(($x/2)); ulimit -s $x || break; done'
4096
2048
1024
512
256
128
64
32
bash: line 1: 109951 Segmentation fault (core dumped) /bin/ls > /dev/null
You can then find the $PC
by looking at the core
dump.
I need the precise limits because I want to figure out what compiler optimizations cause what micro-changes to stack usages (even in the bytes range. along with .data and .text sizes).
I believe it's a fool's errand to attempt that.
In my experience, stack use is most affected by compiler inlining decisions. These in turn are most affected by precise compiler version and tuning, presence of runtime information (for profile-guided optimization), and precise source of the program being optimized.
A yes/no change to inlining decision can increase stack use by 100s of KBs in recursive programs, and minuscule changes to any of the above factors can change that decision.