c gcc for-loop alignment memory-alignment

Is this a GCC bug when using -falign-loops option?

I was playing with this option to optimize a for-loop in our embedded architecture (here). However, I noticed that when the alignment requires more than a single nop instruction to be added, then the compiler generates one nop followed by as-many-as-required zeros (0000).

I suspect it is a bug in our compiler, but can someone confirm it is not a bug in GCC?

Here's a code snippet:

    __asm__ volatile("nop");  
    __asm__ volatile("nop");  

    for (j0=0; j0<N; j0+=4)
    {
        c[j0+ 0] = a[j0+ 0] + b[j0+ 0];
        c[j0+ 1] = a[j0+ 1] + b[j0+ 1];
        c[j0+ 2] = a[j0+ 2] + b[j0+ 2];
        c[j0+ 3] = a[j0+ 3] + b[j0+ 3];
    }

Compile with -falign-loops=8 (or whatever number relevant to your architecture which is more than the required minimum alignment). You can add or remove the __asm__ lines as necessary to generate misaligned loop body.

Solution

Use gcc -S -o foo.s foo.c to generate the assembly output without assembling it. I suspect you'll see the .balign or .p2align directive in the asm. Assuming this directive is intended to work, I think it's a bug in the assembler. It's also possible that you've put the code in a non-default section (i.e. not .text) either intentionally or accidentally (e.g. with a misplaced .data or .section in some other inline asm); normally the assembler pads with the proper size and number of nop instructions for sections that contain code, and 0 bytes for sections that contain data.