Search code examples
assemblyarmgdbcpu-registers

Why arm cpu can run when meet the error assembly instruction created by ltorg?


I am learning the arm assembly language, using qemu vexpress-a9 as virtual arm cpu and the GNU as to assemble. This is my code:

... @ some vector table code

.section .text
Reset_Handler: @ 0x60010120
    @ldr sp, = SRAM_BASE
    ldr r10, =0x1111111 @ I know it is 0x01111111
    ldr r12, =0x2222222
    ldr r5,  =0x3333333
.ltorg
    ldr r11, =0x4444444
    ldr r11, =0x5555555
stop:
    b stop

After assemble, link, objcopy and run in qemu, I got the .bin file and starting at ram address 0x60010120.

@ This is the result of gdb command x/20x 0x60010120!!!
0x60010120: 0xe59fa004      0xe59fc004      0xe59f5004      0x01111111
0x60010130: 0x02222222      0x03333333      0xe59fb004      0xe59fb004
0x60010140: 0xeafffffe      0x04444444      0x05555555      0x00000000

The data at address from 0x6001012C to 0x60010134 is the numeric that I set in code. I supposed the program would corrupt at 0x6001012C. It is not an instruction but data.

However, the program ended at stop: b stop instruction. I stepped from Reset_Handler. The result got from gdb made me confused.

(gdb) ni
_Reset () at startup.s:8
8           b Reset_Handler
(gdb) ni
SRAM_BASE () at startup.s:22
22          ldr r10, =0x1111111
(gdb) i r pc
pc             0x60010120          0x60010120 <SRAM_BASE>
(gdb) ni
23          ldr r12, =0x2222222
(gdb) i r pc
pc             0x60010124          0x60010124 <SRAM_BASE+4>
(gdb) ni
24          ldr r5,  =0x3333333
(gdb) i r pc
pc             0x60010128          0x60010128 <SRAM_BASE+8>
(gdb) ni
22          ldr r10, =0x1111111
(gdb) i r pc
pc             0x6001012c          0x6001012c <SRAM_BASE+12>
(gdb) ni
23          ldr r12, =0x2222222
(gdb) i r pc
pc             0x60010130          0x60010130 <SRAM_BASE+16>
(gdb) ni
24          ldr r5,  =0x3333333
(gdb) i r pc
pc             0x60010134          0x60010134 <SRAM_BASE+20>
(gdb) ni
26          ldr r11, =0x4444444
(gdb) i r pc
pc             0x60010138          0x60010138 <SRAM_BASE+24>
(gdb) ni
27          ldr r11, =0x5555555
(gdb) i r pc
pc             0x6001013c          0x6001013c <SRAM_BASE+28>
(gdb) ni
stop () at startup.s:29
29          b stop

As we can see, ldr instruction befor .ltorg execute twice. Why is the data in ram 0x01111111 but the command executed in cpu is ldr r10, =0x1111111 in line 22? I supposed the program would corrupt at line 22.


Solution

  • Shortly, you get luck... That just happens that 0x01111111 0x02222222 0x03333333 are valid instructions.

    Now let's elaborate. I run following code on ARMv7 (Cortex-A9, SoC Zynq-7000).

    void test_so() __attribute__((naked));
    void test_so()
    {
        asm volatile
        (
            "ldr r0, =0x1111111 \n\t"
            "ldr r1, =0x2222222 \n\t"
            "ldr r2, =0x3333333 \n\t"
            ".ltorg             \n\t"
            "add r3, r0, r1     \n\t"
            // crash it
            "mov r3, #0         \n\t"
            "ldr r3, [r3]       \n\t"
    
            :::"memory", "r0", "r1", "r2", "r3"
        );
    }
    ...
    
    printf("test start.\n");
    test_so();
    printf("test end.\n");
    
    

    test_so disassembly with GNU objdump

    01a281b4 <test_so()>:
     1a281b4:   e59f0004    ldr r0, [pc, #4]    ; 1a281c0 <test_so()+0xc>
     1a281b8:   e59f1004    ldr r1, [pc, #4]    ; 1a281c4 <test_so()+0x10>
     1a281bc:   e59f2004    ldr r2, [pc, #4]    ; 1a281c8 <test_so()+0x14>
     1a281c0:   01111111    tsteq   r1, r1, lsl r1
     1a281c4:   02222222    eoreq   r2, r2, #536870914  ; 0x20000002
     1a281c8:   03333333    teqeq   r3, #-872415232 ; 0xcc000000
     1a281cc:   e0803001    add r3, r0, r1
     1a281d0:   e3a03000    mov r3, #0, 0
     1a281d4:   e5933000    ldr r3, [r3]
    

    As you could see objdump actually shows values in memory pool as instructions.

    Result of execution of this code with intentional crash is

    test start.
    
    ...
    Type: Data Abort
    ...
    ---
    r0: 0x1111111
    r1: 0x2222222
    r2: 0x3333333
    r3: 0x0
    ...
    r13(sp): 0x149b8
    r14(lr): 0x1a28200
    r15(pc): 0x1a281d4  <-- address of Instruction causing a crash
    ---
    

    So CPU executed questioned instructions (which are data in memory pool) and crashed as planned on
    1a281d4: e5933000 ldr r3, [r3] (null dereferencing, r3 is zero)



    For an extra fun let's make an undefined instruction abort with following code

        asm volatile
        (
            "ldr r0, =0x1111111 \n\t"
            "ldr r1, =0x2222222 \n\t"
            "ldr r2, =0x3333333 \n\t"
            ".ltorg             \n\t"
            "add r3, r0, r1     \n\t"
            // crash it
            "udf #1             \n\t"  <-- undefined instruction
    
            :::"memory", "r0", "r1", "r2", "r3"
        );
    

    Disassembly is pretty much same with exception that null dereferencing is replaced with undefined instruction udf

    01a281b4 <test_so()>:
     1a281b4:   e59f0004    ldr r0, [pc, #4]    ; 1a281c0 <test_so()+0xc>
     1a281b8:   e59f1004    ldr r1, [pc, #4]    ; 1a281c4 <test_so()+0x10>
     1a281bc:   e59f2004    ldr r2, [pc, #4]    ; 1a281c8 <test_so()+0x14>
     1a281c0:   01111111    tsteq   r1, r1, lsl r1
     1a281c4:   02222222    eoreq   r2, r2, #536870914  ; 0x20000002
     1a281c8:   03333333    teqeq   r3, #-872415232 ; 0xcc000000
     1a281cc:   e0803001    add r3, r0, r1
     1a281d0:   e7f000f1    udf #1
    

    Running this code would crash like

    test start.
    ...
    Type: Undefined Instruction Abort
    ...
    ---
    r0: 0x1111111
    r1: 0x2222222
    r2: 0x3333333
    r3: 0x3333333
    ...
    r13(sp): 0x149b8
    r14(lr): 0x1a281fc
    r15(pc): 0x1a281d0   <-- address of Instruction causing a crash
    ---
    

    So in this case that's a real instruction abort caused by
    1a281d0: e7f000f1 udf #1

    PS: It seems that my first assumption about buggy emulator was wrong after all.


    PPS: `llvm-objdump` is 'less parsy' though, and does not convert memory pool into instructions, which is kinda annoying in this case.
    test_so():
     1a281b4:   04 00 9f e5     ldr r0, [pc, #4]
     1a281b8:   04 10 9f e5     ldr r1, [pc, #4]
     1a281bc:   04 20 9f e5     ldr r2, [pc, #4]
    $d.4:
     1a281c0:   11 11 11 01     .word   0x01111111
     1a281c4:   22 22 22 02     .word   0x02222222
     1a281c8:   33 33 33 03     .word   0x03333333
    $a.5:
     1a281cc:   01 30 80 e0     add r3, r0, r1
     1a281d0:   f1 00 f0 e7     udf #1