I have an existing project, which we compile DEBUG for developers (and -O0 so lldb makes sense). But I have one function in particular that balloons in size when -O0 is used:
-O2 -Wframe-larger-than=100
warning: stack frame size of 168 bytes in function 'dsl_scan_visitbp'
-O0 -Wframe-larger-than=100
warning: stack frame size of 1160 bytes in function 'dsl_scan_visitbp'
and with some recursion, the stack can be very trashed (16K stacks in kernel).
First thing to inspect are any local variables, but I believe there are only two:
dsl_pool_t *dp = scn->scn_dp;
blkptr_t *bp_toread = NULL;
If you want to see the whole function: https://github.com/openzfs/zfs/blob/master/module/zfs/dsl_scan.c#L1908 (Linux sources, but dealing with Apple clang port)
There are a bunch of alwaysinline
in that sourcefile, which may also come to play here.
But I am curious why it grows so large with -O0?
Then what to do about it, I can't see any Apple-clang #pragmas to turn "on" optimize in a source file (only turning off optimize) for one function, or one file. If I knew what the cause was, perhaps I can control that specific issue with a different pragma.
Only solution I see right now, is to have dsl_scan.c
processed differently in the Makefile, so that only that file always gets -O2. But that is a bit tedious.
I'm not familiar with the code base, so I don't see any obvious variables that would be taking large amounts of stack space. However, I notice that the functions (including the always_inline
d) are quite long. Typically, in debug builds, every variable and temporary expression result is assigned a unique space in the stack frame, regardless of scope. So even if 2 variables' lifetimes do not overlap (e.g. one is declared in the if
block, and another in the else
block) they will be allocated separate spaces in memory. So this can add up even if there are a lot of small short-lived variables and temporary values.
You are probably best off disabling always_inline
attributes in all functions called by this function in debug builds, as this avoids pre-allocating memory for all possible branches of execution even if they are never taken, or if they are declared in a function that's not involved in the recursion.