I'm using gcc 12.2.0 on x86_64 and compiling x64 code on there. I've run into an odd issue that is causing me problems and have reduced it down to a minimal reproducer:
#include <stdint.h>
#include <stdbool.h>
struct foobar_t {
uint8_t data[512];
};
void my_memset(void *target) {
#if 1
for (int i = 0; i < 256; i++) {
((uint16_t*)target)[i] = 0xabcd;
}
#else
for (int i = 0; i < 512; i++) {
((uint8_t*)target)[i] = 0xab;
}
#endif
}
int main() {
struct foobar_t foobar;
my_memset(&foobar);
if (foobar.data[123] == 0) {
volatile int x = 0;
}
return 0;
}
When the #if 1
path is taken, I get a compiler warning:
$ gcc -O3 -fno-stack-protector -Wall -c -o x.o x.c
[...]
x.c:46:24: warning: ‘foobar’ is used uninitialized [-Wuninitialized]
46 | if (foobar.data[123] == 0) {
That error completely disappears when I use the second code path (#if 0
) where the only difference is that in the first there's 256 16-bit words set while in the second there are 512 bytes set.
In the case that I get the warning, the generated assembly also looks wrong:
0000000000000000 <my_memset>:
0: f3 0f 1e fa endbr64
4: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # c <my_memset+0xc>
c: 48 8d 87 00 02 00 00 lea 0x200(%rdi),%rax
13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
18: 0f 11 07 movups %xmm0,(%rdi)
1b: 48 83 c7 10 add $0x10,%rdi
1f: 48 39 f8 cmp %rdi,%rax
22: 75 f4 jne 18 <my_memset+0x18>
24: c3 ret
0000000000000030 <main>:
30: f3 0f 1e fa endbr64
34: 48 81 ec a0 01 00 00 sub $0x1a0,%rsp
3b: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # 43 <main+0x13>
43: 48 8d 44 24 98 lea -0x68(%rsp),%rax
48: 48 8d 94 24 98 01 00 lea 0x198(%rsp),%rdx
50: 0f 29 00 movaps %xmm0,(%rax)
53: 48 83 c0 10 add $0x10,%rax
57: 48 39 c2 cmp %rax,%rdx
5a: 75 f4 jne 50 <main+0x20>
5c: 80 7c 24 13 00 cmpb $0x0,0x13(%rsp)
61: 75 08 jne 6b <main+0x3b>
63: c7 44 24 94 00 00 00 movl $0x0,-0x6c(%rsp)
6b: 31 c0 xor %eax,%eax
6d: 48 81 c4 a0 01 00 00 add $0x1a0,%rsp
74: c3 ret
This only reserves 0x1a0 bytes on the stack, 416 bytes. That does not fit the structure! How can that be? What is the reason for this happening?
I've tried removing as much code as possible while still retaining the warning. If I disable optimization, the warning also goes away.
Your #if 1
code is illegal (undefined behavior) because it violates the strict aliasing rule. Very roughly speaking, subject to certain narrow exceptions, you must not access the same memory through pointers to two different types.
As such, the compiler is entitled to assume that accesses to memory through one pointer type aren't seen by accesses through another pointer type. So it's not surprising that it would think that foobar
is uninitialized, since it doesn't consider the possibility that an access to a uint16_t
object could touch it.
There is an exception in the standard for character types, precisely so that you can implement things like memset
and memcpy
using character pointers. So your #else
code is legal, and in fact the compiler is able to recognize that the my_memset
code does initialize foobar
, and so you don't get the warning. (Strictly speaking your code ought to use unsigned char
instead of uint8_t
- they are typedef'd the same on most compilers, but the language standard does not guarantee that to be the case.)
The thing about "insufficient stack" is actually normal and not a problem. The object foobar
is located on the stack from offset rsp-0x68
to rsp+0x198
which is precisely 512 bytes, just as it should be. It may look strange that part of it is below the stack pointer, but this is okay because it is within the 128-byte red zone.
The red zone is only usable in leaf functions (i.e. those which don't call other functions), so it can only be used in main
if the call to my_memset
is inlined. This isn't done when optimizations are off, so you don't see the red zone used in that case.
Using the red zone doesn't really accomplish much in this example. The main benefit is in functions where, by using the red zone, you avoid having to adjust the stack pointer at all. Here, the stack pointer would have to be adjusted anyway, so we haven't gained anything in comparison to the more natural implementation of subtracting a full 512 bytes from the stack pointer. But the code with the red zone is still perfectly valid and equivalent in terms of performance, it just looks funny. So this is just a slightly odd quirk of the compiler's stack layout algorithm.