Search code examples
gccassemblyarmreverse-engineeringthumb

Getting an label address to a register on THUMB assembly - Armv5


I am trying to get the the address of a label in thumb assembly and I am having some trouble.

I already read this post but that cannot help me and I will explain why.

I am writing an simple program with Thumb assembly ( unfortunately I cannot use Thumb2 ).

Let's consider this code:

 .arch armv5te
 .syntax unified
 .text     

 .thumb
 .thumb_func
 thumbnow:
 0x0       PUSH {LR}
 0x2       LDR R0, =loadValues
 0x4       POP {PC}
 .align
 loadValues:
 0x8        .word 0xdeadbee1
 0xC        .word 0xdeadbee2
 0x10       .word 0xdeadbee3

I am using the arm-linux-gnueabi toolchain to assemble that.

My microcontroller doesn't have an MMU so the memory address are static, no virtual pages etc.

The thing that I am trying to do is to make R0 having the value of 0x8 here so that then I can access the three words like this:

LDR R1, [R0]
LDR R2, [R0,#4]
LDR R3, [R0,#8]

This is not possible with LDR though because the value in the word is not possible to fit in a MOV command. The documentation of the assembler states that if the value cannot fit in a MOV command then it will put the value in a literal pool.

So my question is, is it possible in Thumb assembly to get the actual address of the label if the content of the address cannot fit in a MOV command?


Solution

  • Starting with this

    .thumb
    
        ldr r0,=hello
        adr r0,hello
    
    nop
    nop
    nop
    nop
    hello:
        .word 0,1,2,3
    

    gives this unlinked

    00000000 <hello-0xc>:
       0:   4806        ldr r0, [pc, #24]   ; (1c <hello+0x10>)
       2:   a002        add r0, pc, #8  ; (adr r0, c <hello>)
       4:   46c0        nop         ; (mov r8, r8)
       6:   46c0        nop         ; (mov r8, r8)
       8:   46c0        nop         ; (mov r8, r8)
       a:   46c0        nop         ; (mov r8, r8)
    
    0000000c <hello>:
       c:   00000000    andeq   r0, r0, r0
      10:   00000001    andeq   r0, r0, r1
      14:   00000002    andeq   r0, r0, r2
      18:   00000003    andeq   r0, r0, r3
      1c:   0000000c    andeq   r0, r0, r12
    

    linked

    00001000 <hello-0xc>:
        1000:   4806        ldr r0, [pc, #24]   ; (101c <hello+0x10>)
        1002:   a002        add r0, pc, #8  ; (adr r0, 100c <hello>)
        1004:   46c0        nop         ; (mov r8, r8)
        1006:   46c0        nop         ; (mov r8, r8)
        1008:   46c0        nop         ; (mov r8, r8)
        100a:   46c0        nop         ; (mov r8, r8)
    
    0000100c <hello>:
        100c:   00000000    andeq   r0, r0, r0
        1010:   00000001    andeq   r0, r0, r1
        1014:   00000002    andeq   r0, r0, r2
        1018:   00000003    andeq   r0, r0, r3
        101c:   0000100c    andeq   r1, r0, r12
    

    both ways r0 will return the address to the start of data from which you can then offset into that data from the caller or wherever.

    Edit

    .thumb
    adr r0,hello
    nop
    nop
    nop
    
     arm-none-eabi-as so.s -o so.o
    so.s: Assembler messages:
    so.s:2: Error: address calculation needs a strongly defined nearby symbol
    

    So the tool won't turn that into a load from the pool for you.

    For what you want to do I think the pc relative add (adr) is the best you are going to get. You can try other toolchains as all of this is language and toolchain specific (assembly language is defined by the assembler not the target and for each toolchain (with an assembler) there can be differences in the language). Over time within gnu, how the linker and assembler worked together has changed, the linker patches up things it didn't used to.

    You could of course go into the linker and add code to it to perform this optimization, the problem is most likely that by link time the linker is looking to resolve an address in the pool which is easy for it to do it doesn't have to change the instruction, the assembler would have to leave information for the linker that this is not just a fill this memory location with an address thing, either you modify gas to allow adr to work, and then if the linker cant resolve it within the instruction then the linker bails out with an error.'

    Or you could just hard-code what you want and maintain it. I am not sure why the adr solution isn't adequate.

    mov r0,#8 is a valid thumb instruction.