Post-Answer-Acceptance Summary: The problem was the use of a pointer to a stack variable that had gone out of scope. It had nothing to do with optimization. It is a pity that valgrind can't find stack errors...
I have a segfault that appears only when enabling -O1 level optimization in gcc 4.4.4 (CentOS 5.5). All other optimization levels (0,2,3,s) are fine. I haven't managed to create a reduced test case for it yet, but it appears to be related to an array offset calculation causing the stack to be overwritten.
If I enable -O1 and disable all optimizations that have a flag (http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) the bug still occurs.
If I use -O2 (or any other level) there is no problem. If I use O2 and disable strict-aliasing with -fno-strict-aliasing
then the segfault returns.
Edit: If I add -fstack-protector-all
to the build flags (either O1
or O2 -fno-strict-aliasing
) the segfault disappears.
So it appears to be caused by an optimization that happens by default in O1 that is disabled by strict-aliasing.
I suspect that this is a compiler bug (but without a reduced testcase I can't prove it). This is a production server that needs a quick turn around. The normal optimization level is O1 and I'm loathe to just change it to O2 as it seems that the fix might be more dangerous than the original problem.
I would really appreciate some suggestions. Currently I'm thinking to try compiling gcc 4.4.6 and seeing if that fixes it. However not knowing for sure what is causing the problem is a little worrying.
Edit: the server is compiled with -Wall -Werror
(and a few others). It runs without error in valgrind (valgrind checks heap accesses and this appears to be a stack related error).
Often, compiler optimizations can expose invalid or undefined behavior in source code, that you are lucky to get to work otherwise. A few things I would try:
-Wall -Wextra
valgrind
to see if you can get more of a hint of where the error is