Search code examples
cfunctionarmeabi

What is 'veneer' that arm linker uses in function call?


I just read https://www.keil.com/support/man/docs/armlink/armlink_pge1406301797482.htm. but can't understand what a veneer is that arm linker inserts between function calls.

In "Procedure Call Standard for the ARM Architecture" document, it says,

5.3.1.1 Use of IP by the linker Both the ARM- and Thumb-state BL instructions are unable to address the full 32-bit address space, so it may be necessary for the linker to insert a veneer between the calling routine and the called subroutine. Veneers may also be needed to support ARM-Thumb inter-working or dynamic linking. Any veneer inserted must preserve the contents of all registers except IP (r12) and the condition code flags; a conforming program must assume that a veneer that alters IP may be inserted at any branch instruction that is exposed to a relocation that supports inter-working or long branches. Note R_ARM_CALL, R_ARM_JUMP24, R_ARM_PC24, R_ARM_THM_CALL, R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19 are examples of the ELF relocation types with this property. See [AAELF] for full details

Here is what I guess, is it something like this ? : when function A calls function B, and when those two functions are too far apart for the bl command to express, the linker inserts function C between function A and B in such a way function C is close to function B. Now function A uses b instruction to go to function C(copying all the registers between the function call), and function C uses bl instruction(copying all the registers too). Of course the r12 register is used to keep the remaining long jump address bits. Is this what veneer means? (I don't know why arm doesn't explain what veneer is but only what veneer provides..)


Solution

  • It is just a trampoline. Interworking is the easier one to demonstrate, using gnu here, but the implication is that Kiel has a solution as well.

    .globl even_more
    .type eve_more,%function
    even_more:
        bx lr
    
    .thumb
    
    .globl more_fun
    .thumb_func
    more_fun:
        bx lr
    
    
    
    extern unsigned int more_fun ( unsigned int x );
    extern unsigned int even_more ( unsigned int x );
    unsigned int fun ( unsigned int a )
    {
        return(more_fun(a)+even_more(a));
    }
        
    Unlinked object:
    
    Disassembly of section .text:
    
    00000000 <fun>:
       0:   e92d4070    push    {r4, r5, r6, lr}
       4:   e1a05000    mov r5, r0
       8:   ebfffffe    bl  0 <more_fun>
       c:   e1a04000    mov r4, r0
      10:   e1a00005    mov r0, r5
      14:   ebfffffe    bl  0 <even_more>
      18:   e0840000    add r0, r4, r0
      1c:   e8bd4070    pop {r4, r5, r6, lr}
      20:   e12fff1e    bx  lr
    
    Linked binary (yes completely unusable, but demonstrates what the tool does)
    
    Disassembly of section .text:
    
    00001000 <fun>:
        1000:   e92d4070    push    {r4, r5, r6, lr}
        1004:   e1a05000    mov r5, r0
        1008:   eb000008    bl  1030 <__more_fun_from_arm>
        100c:   e1a04000    mov r4, r0
        1010:   e1a00005    mov r0, r5
        1014:   eb000002    bl  1024 <even_more>
        1018:   e0840000    add r0, r4, r0
        101c:   e8bd4070    pop {r4, r5, r6, lr}
        1020:   e12fff1e    bx  lr
    
    00001024 <even_more>:
        1024:   e12fff1e    bx  lr
    
    00001028 <more_fun>:
        1028:   4770        bx  lr
        102a:   46c0        nop         ; (mov r8, r8)
        102c:   0000        movs    r0, r0
        ...
    
    00001030 <__more_fun_from_arm>:
        1030:   e59fc000    ldr r12, [pc]   ; 1038 <__more_fun_from_arm+0x8>
        1034:   e12fff1c    bx  r12
        1038:   00001029    .word   0x00001029
        103c:   00000000    .word   0x00000000
    

    You cannot use bl to switch modes between arm and thumb so the linker has added a trampoline as I call it or have heard it called that you hop on and off to get to the destination. In this case essentially converting the branch part of bl into a bx, the link part they take advantage of just using the bl. You can see this done for thumb to arm or arm to thumb.

    The even_more function is in the same mode (ARM) so no need for the trampoline/veneer.

    For the distance limit of bl lemme see. Wow, that was easy, and gnu called it a veneer as well:

    .globl more_fun
    .type more_fun,%function
    more_fun:
        bx lr
    
    extern unsigned int more_fun ( unsigned int x );
    unsigned int fun ( unsigned int a )
    {
        return(more_fun(a)+1);
    }
    
    MEMORY
    {
        bob : ORIGIN = 0x00000000, LENGTH = 0x1000
        ted : ORIGIN = 0x20000000, LENGTH = 0x1000
    }
    SECTIONS
    {
        .some   : { so.o(.text*)       } > bob
        .more   : { more.o(.text*)      } > ted
    }
    
    Disassembly of section .some:
    
    00000000 <fun>:
       0:   e92d4010    push    {r4, lr}
       4:   eb000003    bl  18 <__more_fun_veneer>
       8:   e8bd4010    pop {r4, lr}
       c:   e2800001    add r0, r0, #1
      10:   e12fff1e    bx  lr
      14:   00000000    andeq   r0, r0, r0
    
    00000018 <__more_fun_veneer>:
      18:   e51ff004    ldr pc, [pc, #-4]   ; 1c <__more_fun_veneer+0x4>
      1c:   20000000    .word   0x20000000
    
    Disassembly of section .more:
    
    20000000 <more_fun>:
    20000000:   e12fff1e    bx  lr
    

    Staying in the same mode it did not need the bx.

    The alternative is that you replace every bl instruction at compile time with a more complicated solution just in case you need to do a far call. Or since the bl offset/immediate is computed at link time you can, at link time, put the trampoline/veneer in to change modes or cover the distance.

    You should be able to repeat this yourself with Kiel tools, all you needed to do was either switch modes on an external function call or exceed the reach of the bl instruction.

    Edit

    Understand that toolchains vary and even within a toolchain, gcc 3.x.x was the first to support thumb and I do not know that I saw this back then. Note the linker is part of binutils which is as separate development from gcc. You mention "arm linker", well arm has its own toolchain, then they bought Kiel and perhaps replaced Kiel's with their own or not. Then there is gnu and clang/llvm and others. So it is not a case of "arm linker" doing this or that, it is a case of the toolchains linker doing this or that and each toolchain is first free to use whatever calling convention they want there is no mandate that they have to use ARM's recommendations, second they can choose to implement this or not or simply give you a warning and you have to deal with it (likely in assembly language or through function pointers).

    ARM does not need to explain it, or let us say, it is clearly explained in the Architectural Reference Manual (look at the bl instruction, the bx instruction look for the words interworking, etc. All quite clearly explained) for a particular architecture. So there is no reason to explain it again. Especially for a generic statement where the reach of bl varies and each architecture has different interworking features, it would be a long set of paragraphs or a short chapter to explain something that is already clearly documented.

    Anyone implementing a compiler and linker would be well versed in the instruction set before hand and understand the bl and conditional branch and other limitations of the instruction set. Some instruction sets offer near and far jumps and some of those the assembly language for the near and far may be the same mnemonic so the assembler will often decide if it does not see the label in the same file to implement a far jump/call rather than a near one so that the objects can be linked.

    In any case before linking you have to compile and assembly and the toolchain folks will have fully understood the rules of the architecture. ARM is not special here.