Search code examples
cmemoryx86-64memory-alignment

Why is malloc 16 byte aligned?


The GNU documentation states that malloc is aligned to 16 byte multiples on 64 bit systems. Why is this?

If my understanding is correct, registers and all instructions operate on values that are a maximum of 8 bytes wide. Thus, it would seem that 8 byte alignment would be the requirement.

Notes:


Solution

  • x86-64 System V uses x87 for long double, the 80-bit type. And pads it to 16-byte, with alignof(long double) == 16 so a long double will never cross a cache-line boundary. (Worth it or not, IDK; likely SSE2 was one of the motivations for supporting 16-byte alignment cheaply).

    But anyway, SSE stuff isn't the only thing contributing to alignof(max_align_t) == 16 (which sets the minimum alignment that malloc is allowed to return).

    The existence of__m128i doesn't directly contribute to max_align_t at all, for example 32-bit C implementations support it with lower malloc guarantees. Certainly the existence of __m256i on systems supporting AVX didn't increase the alignment guarantees for allocators. (How to solve the 32-byte-alignment issue for AVX load/store operations?). But certainly it's convenient for vectorization, both auto and manual, that malloced memory is aligned enough for movaps, especially on older CPUs when x86-64 was new and movups had penalties even when the memory was aligned. It's hard for a compiler to take advantage of that guarantee if it only sees a float*, you could have passed it a pointer into the middle of an allocation. But if it can see the malloc of an output array, it knows it will be aligned if auto-vectorizing a loop that writes to that newly malloced space.

    BTW, ISO C would let malloc for a small allocation (like 1 to 15 bytes) return less-aligned space, since the space could still be used to hold any type that would fit. In C, an object can't require more alignment than its size. (e.g. you can't typedef an int that always has to be at the start of a cache line, or if you do the sizeof expands with padding.)