Search code examples
assemblycompilationarm

Assembly language optimizer


When you compile a c++ or any other compiled language there is an optimizer that runs and rewrites some of the code in a more efficient way. Since you don't compile assembly or arm assembly in the sense you that you do a compiled language, is there an optimizer running, or does the computer run exactly what you type?


Solution

  • Consider this snippet assembled with nasm -f elf64 -O0 no optimization

        section .rodata
      Prompt:   db  'Prompting Text', 0
    
            section .text
            global  _start    
               
      _start:   xor     rax, rax
                mov     rsi, Prompt
                jmp     Done
                nop
                nop
                nop
                nop
        Done:   xor     rdi, rdi
                mov     eax, 60
                syscall
    

    Resulting object code is this;

    000  4831C0            xor rax,rax
    003  48BE000000000000  mov rsi,0x0
             -0000
    00D  E904000000        jmp qword 0x16
    012  90                nop
    013  90                nop
    014  90                nop
    015  90                nop
    016  4831FF            xor rdi,rdi
    019  B83C0             mov eax,0x3c
    01E  0F05              syscall
    020
    

    Assembled with default optimization nasm -f elf64 and the only thing that happened is that assembler figures out that the jump is within 128 bytes so it changed it to short, thus saving 3 bytes.

    00  4831C0            xor rax,rax
    03  48BE000000000000  mov rsi,0x0
             -0000
    0D  EB04              jmp short 0x13
    0F  90                nop
    10  90                nop
    11  90                nop
    12  90                nop
    13  4831FF            xor rdi,rdi
    16  B83C000000        mov eax,0x3c
    1B  0F05              syscall
    1D 
    

    Modify source to force optimization without the assembler option being set

            section .rodata
      Prompt:   db  'Prompting Text', 0
    
            section .text
            global  _start     
               
      _start:   xor     eax, eax
                mov     esi, Prompt
                jmp     short Done
                nop
                nop
                nop
                nop
        Done:   xor     edi, edi
                mov     eax, 60
                syscall
    

    and the result is;

    00  31C0              xor eax,eax
    02  BE00000000        mov esi,0x0
    07  EB04              jmp short 0xd
    09  90                nop
    0A  90                nop
    0B  90                nop
    0C  90                nop
    0D  31FF              xor edi,edi
    0F  B83C000000        mov eax,0x3c
    14  0F05              syscall
    16  
    

    This is different for different assemblers, but my contention is as @Ped7g has already pointed out, best to know the instruction set so there is a direct correlation between what you've written and object code.

    In case you're not aware a lot of instructions sign extend into 64 bits, that's why xor eax, eax yields the same result as xor rax, rax but saves 1 byte.