I was playing with this option to optimize a for-loop in our embedded architecture (here). However, I noticed that when the alignment requires more than a single nop
instruction to be added, then the compiler generates one nop
followed by as-many-as-required zeros (0000
).
I suspect it is a bug in our compiler, but can someone confirm it is not a bug in GCC?
Here's a code snippet:
__asm__ volatile("nop");
__asm__ volatile("nop");
for (j0=0; j0<N; j0+=4)
{
c[j0+ 0] = a[j0+ 0] + b[j0+ 0];
c[j0+ 1] = a[j0+ 1] + b[j0+ 1];
c[j0+ 2] = a[j0+ 2] + b[j0+ 2];
c[j0+ 3] = a[j0+ 3] + b[j0+ 3];
}
Compile with -falign-loops=8
(or whatever number relevant to your architecture which is more than the required minimum alignment). You can add or remove the __asm__
lines as necessary to generate misaligned loop body.
Use gcc -S -o foo.s foo.c
to generate the assembly output without assembling it. I suspect you'll see the .balign
or .p2align
directive in the asm. Assuming this directive is intended to work, I think it's a bug in the assembler. It's also possible that you've put the code in a non-default section (i.e. not .text
) either intentionally or accidentally (e.g. with a misplaced .data
or .section
in some other inline asm); normally the assembler pads with the proper size and number of nop
instructions for sections that contain code, and 0 bytes for sections that contain data.