Search code examples
cfor-loopassemblyarm

More efficient Asm with unconventional for-loop?


I was playing around with compiler explorer, trying to learn a little more about ARM-Assembly. Im using arm64 msvc v19.latest. I noticed that I had one branch less like this:

int main(){
    for(unsigned i = 0; i<8;)
    i++;
    return 0;
}

compared to the "conventional" way of writing a for-loop like this:

int main(){
    for(unsigned i = 0; i<8;i++)
    ;
    return 0;
}

Is it therefore more efficient to write the for-loop in an unconventional way? I'll paste in both asm to compare. First with the unconventional method:

        ;Flags[SingleProEpi] functionLength[52] RegF[0] RegI[0] H[0] frameChainReturn[UnChained] frameSize[16]

|main|  PROC
|$LN6|
        sub         sp,sp,#0x10
        mov         w8,#0
        str         w8,[sp]
|$LN2@main|
        ldr         w8,[sp]
        cmp         w8,#8
        bhs         |$LN3@main|
        ldr         w8,[sp]
        add         w8,w8,#1
        str         w8,[sp]
        b           |$LN2@main|
|$LN3@main|
        mov         w0,#0
        add         sp,sp,#0x10
        ret

        ENDP  ; |main|

and the convetional way:

     ;Flags[SingleProEpi] functionLength[56] RegF[0] RegI[0] H[0] frameChainReturn[UnChained] frameSize[16]

|main|  PROC
|$LN6|
        sub         sp,sp,#0x10
        mov         w8,#0
        str         w8,[sp]
        b           |$LN4@main|
|$LN2@main|
        ldr         w8,[sp]
        add         w8,w8,#1
        str         w8,[sp]
|$LN4@main|
        ldr         w8,[sp]
        cmp         w8,#8
        bhs         |$LN3@main|
        b           |$LN2@main|
|$LN3@main|
        mov         w0,#0
        add         sp,sp,#0x10
        ret

        ENDP  ; |main|

Solution

  • If you want optimized code, ask your compiler for it! There's no point in examining how optimized unoptimized code is.

    -O3 completely eliminates the loop.

    Compiler Explorer demo: standard
    Compiler Explorer demo: non-standard

    If we add something with a side-effect to the loop, we get the exact same result from both approaches.

    Compiler Explorer demo: standard
    Compiler Explorer demo: non-standard

    That optimized code is the equivalent of

    printf("%d\n", 1);
    printf("%d\n", 2);
    printf("%d\n", 3);
    printf("%d\n", 4);
    printf("%d\n", 5);
    printf("%d\n", 6);
    printf("%d\n", 7);
    printf("%d\n", 8);