It appears gcc will happily auto-vectorize simple examples, and emit SSE instructions. Is there any way to emit MMX instructions only?
For example if I try the following example on Godbolt:
int sumint(int *arr) {
int sum = 0;
for (int i=0 ; i<2048 ; i++){
sum += arr[i];
}
return sum;
}
compiling on GCC 9.2 with -mmmx -O3 -m32 -msse2
, I get
sumint:
mov eax, DWORD PTR [esp+4]
pxor xmm0, xmm0
lea edx, [eax+8192]
.L2:
movdqu xmm2, XMMWORD PTR [eax]
add eax, 16
paddd xmm0, xmm2
cmp edx, eax
jne .L2
movdqa xmm1, xmm0
psrldq xmm1, 8
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 4
paddd xmm0, xmm1
movd eax, xmm0
ret
But without sse (i.e. -mmmx -O3 -m32 -mno-sse2
), it falls back to only using general registers, and no mmx instructions:
sumint:
mov eax, DWORD PTR [esp+4]
xor edx, edx
lea ecx, [eax+8192]
.L2:
add edx, DWORD PTR [eax]
add eax, 4
cmp eax, ecx
jne .L2
mov eax, edx
ret
I wanted to run some Benchmarks, comparing the effect of running with just x87-fpu, MMX, SSE and SSE2, but if gcc won't emit MMX instructions, then there won't be any difference between compiling for x87 and x87+mmx.
GCC can't autovectorize using MMX or 3DNow! because it lacks the ability to properly insert EMMS/FEMMS. You have to use ICC for MMX. See https://gcc.gnu.org/ml/gcc-patches/2004-12/msg01955.html