Search code examples
cassemblyarm

How can I generate following arm assembler output using ARM gcc 7.3?


myfunction:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mul r3, r0, r0
mov r0, r3
mla r0, r1, r0, r2
bx  lr

I am able to generate everything except for the mov instruction using following C function.

int myfunction(int r0, int r1, int r2, int r3)
{
  r3 = r0*r0;
  r0 = r3;
  r3 = r0;
  return (r1*r3)+r2;
}

How can I instruct r3 to be set to the address of r0 in assembly code?


Solution

  • unsigned int myfunction(unsigned int a, unsigned int  b, unsigned int c)
    {
      return (a*a*b)+c;
    }
    

    Your choices are going to be something like this

    00000000 <myfunction>:
       0:   e52db004    push    {r11}       ; (str r11, [sp, #-4]!)
       4:   e28db000    add r11, sp, #0
       8:   e24dd014    sub sp, sp, #20
       c:   e50b0008    str r0, [r11, #-8]
      10:   e50b100c    str r1, [r11, #-12]
      14:   e50b2010    str r2, [r11, #-16]
      18:   e51b3008    ldr r3, [r11, #-8]
      1c:   e51b2008    ldr r2, [r11, #-8]
      20:   e0010392    mul r1, r2, r3
      24:   e51b200c    ldr r2, [r11, #-12]
      28:   e0000291    mul r0, r1, r2
      2c:   e51b3010    ldr r3, [r11, #-16]
      30:   e0803003    add r3, r0, r3
      34:   e1a00003    mov r0, r3
      38:   e28bd000    add sp, r11, #0
      3c:   e49db004    pop {r11}       ; (ldr r11, [sp], #4)
      40:   e12fff1e    bx  lr
    

    or this

    00000000 <myfunction>:
       0:   e0030090    mul r3, r0, r0
       4:   e0202391    mla r0, r1, r3, r2
       8:   e12fff1e    bx  lr
    

    as you have probably figured out.

    The mov should never be considered by the compiler backend as it just wastes an instruction. r3 goes into the mla no need to put it in r0 then do the mla. Not quite sure how to get the compiler to do more. Even this doesn't encourage it

    unsigned int fun ( unsigned int a )
    {
        return(a*a);
    }
    unsigned int myfunction(unsigned int a, unsigned int  b, unsigned int c)
    {
      return (fun(a)*b)+c;
    }
    

    giving

    00000000 <fun>:
       0:   e1a03000    mov r3, r0
       4:   e0000093    mul r0, r3, r0
       8:   e12fff1e    bx  lr
    
    0000000c <myfunction>:
       c:   e0030090    mul r3, r0, r0
      10:   e0202391    mla r0, r1, r3, r2
      14:   e12fff1e    bx  lr
    

    Basically if you don't optimize you get nowhere near what you were after. If you optimize that mov shouldn't be there, should be easy to optimize out.

    While some level of manipulation of writing high level code to encourage the compiler to output low level code is possible, trying to get this exact output is not something you should expect to be able to do.

    Unless you use inline asm

    asm
    (
       "mul r3, r0, r0\n"
       "mov r0, r3\n"
       "mla r0, r1, r0, r2\n"
       "bx lr\n"
    );
    

    giving your result

    Disassembly of section .text:
    
    00000000 <.text>:
       0:   e0030090    mul r3, r0, r0
       4:   e1a00003    mov r0, r3
       8:   e0202091    mla r0, r1, r0, r2
       c:   e12fff1e    bx  lr
    

    or real asm

    mul r3, r0, r0
    mov r0, r3
    mla r0, r1, r0, r2
    bx lr
    

    and feed it into gcc rather than as (arm-whatever-gcc so.s -o so.o)

    Disassembly of section .text:
    
    00000000 <.text>:
       0:   e0030090    mul r3, r0, r0
       4:   e1a00003    mov r0, r3
       8:   e0202091    mla r0, r1, r0, r2
       c:   e12fff1e    bx  lr
    

    so that technically you were using gcc on the command line but gcc does some preprocessing and then feeds it to as.

    Unless you find a core or where Rd and Rs have to be the same register and can then specify that core/bug/whatever on the gcc command line, I don't see the mov happening, maybe, just maybe, with clang/llvm compile fun and myfunction separately to bytecode then combine them then optimize then output to the target then examine that. I would hope either in the optimization or the output that the mov would be optimized out but you might get lucky.

    Edit

    I made an error:

    unsigned int myfunction(unsigned int a, unsigned int  b, unsigned int c)
    {
      return (a*a*b)+c;
    }
    
    arm-linux-gnueabi-gcc --version
    arm-linux-gnueabi-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
    Copyright (C) 2015 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    
    Disassembly of section .text:
    
    00000000 <myfunction>:
       0:   e0030090    mul r3, r0, r0
       4:   e1a00003    mov r0, r3
       8:   e0202091    mla r0, r1, r0, r2
       c:   e12fff1e    bx  lr
    

    but this

    arm-none-eabi-gcc --version
    arm-none-eabi-gcc (GCC) 8.2.0
    Copyright (C) 2018 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    arm-none-eabi-gcc -O2 -c so.c -o so.o
    arm-none-eabi-objdump -D so.o
    
    so.o:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00000000 <myfunction>:
       0:   e0030090    mul r3, r0, r0
       4:   e0202391    mla r0, r1, r3, r2
       8:   e12fff1e    bx  lr
    

    I'll have to build a 7.3 or go find one. Somewhere between 5.x.x and 8.x.x the backend changed or...

    Note you may need -mcpu=arm7tdmi or -mcpu=arm9tdmi or -march=armv4t or -march=armv5t on the command line depending on the default target (cpu/arch) built into your compiler. Or you might get something like this

    Disassembly of section .text:
    
    00000000 <myfunction>:
       0:   fb00 f000   mul.w   r0, r0, r0
       4:   fb01 2000   mla r0, r1, r0, r2
       8:   4770        bx  lr
       a:   bf00        nop
    

    this

    arm-none-eabi-gcc --version
    arm-none-eabi-gcc (GCC) 7.3.0
    Copyright (C) 2017 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    

    produces

    Disassembly of section .text:
    
    00000000 <myfunction>:
       0:   e0030090    mul r3, r0, r0
       4:   e0202391    mla r0, r1, r3, r2
       8:   e12fff1e    bx  lr
    

    So you may have to work backward to find the version where it changed, the source code change to gcc that caused it and modify 7.3.0 making something that is not really 7.3.0 but reports as 7.3.0 and outputs your desired code.