Search code examples
assemblyarmcortex-mmicroprocessors

How does the Microprocessor detect if it's in-between an instruction?


I'm using an ST32F401RE (ARM Cortex -M4 32-bit RISC) and was curious about the following.

Normally instructions on a 32 bit ARM can be 2 byte or 4 byte long. I accidentally jumped in-between a 2 byte instruction and the Microprocessor instantly went into an infinite Error Handler loop afterwards.

I later tested this and jumped on purpose in-between a 4 byte and 2 byte instruction and the Microprocessor would always go into the Error Handler.

I used the following c code to jump into Memory Adresses.

void (*foo)(void) = (void (*)())0x80002e8;
foo( ) ;

The Adresses for functions and instructions are from the Disassembly. The Compiler used the following assembler instruction after storing the adress in r3.

blx     r3

Question: How exactly can the Microprocessor tell that it didn't start at the beginning of an instruction but actually started in-between one?
Especially in case of the 16 bit thumb instructions which are already pretty cramped.

I have multiple guesses but want to know what exactly is going on.


Solution

  • Normally instructions on a 32 bit ARM can be 2 byte or 4 byte long.

    Only for Thumb2; on Thumb they are all 2 bytes, and on ARM ("A32") mode they are all 4 bytes.

    Question: How exactly can the Microprocessor tell that it didn't start at the beginning of an instruction but actually started in-between one?

    It can't. If the 2 upper bytes of a 4-byte instruction happen to form a valid 2-byte instruction and you jump there, it will be executed as such. In your case, these upper 2 bytes probably were all invalid instructions, resulting in a fault exception.

    For example, the program

    .code 16
    .syntax unified
    
    test4byte:
        mov.w r0, #0x88000000
        
    test2byte:
        ands r0, r1
    

    will be assembled into

    00000000 <test4byte>:
       0:   f04f 4008   mov.w   r0, #2281701376 ; 0x88000000
    
    00000004 <test2byte>:
       4:   4008        ands    r0, r1
    

    or as a byte-wise hex dump

    4f f0 08 40 08 40
    

    As you see, the sequence 08 40 occurs twice - both as the upper 2 bytes of the mov.w and as the ands instruction, both of which are identical. So, the processor has no way to tell these apart.

    In a program that just contained the shown mov.w instruction, if you jumped to address 0, the mov.w would be executed; if you jumped to address 2, an ands would be executed, even though it doesn't appear in the assembly code.