Search code examples
x86arm64disassemblymachine-codeopcode

Extracting opcodes from arm64 (ARMv8.5-A) instructions


I need do come up with some wildcards to match certain instructions.

For instance, on x86, to match most CALL instructions the following pattern will suffice:

E8 ?? ?? ?? ??

where E8 is the opcode and the following 4 bytes are the relative address to jump to.

But the opcodes for arm64 specifically the A64 instruction set seem to be much more confusing, like, I have these two instructios:

0A696938               ldrb       w10, [x8, x9] ; 
EB834039               ldrb       w11, [sp, #0xc0 + var_A0]

There's no obvious relation if you just look at the encodings 0A696938 and EB834039. How do I get the opcode for this ldrb instruction to create a pattern like the one for the x86?

It feels like the opcodes do not take a whole byte like on x86


Solution

  • That's right, the ARM64 instruction encoding scheme is a lot more complicated than on x86. Indeed the opcode is not always found in the same byte, nor even the same set of bits, nor even a contiguous set of bits within the 32-bit instruction word.

    The instruction encoding rules are defined in Chapter C4 of the Armv8-A Architecture Reference Manual, and it will give you some idea of what you are up against when you observe that this chapter is 328 pages long. However, if you are just trying to identify some particular instruction or class of instructions, it may not be too bad. There is a rough hierarchical grouping. For example, all branch instructions have 101 in bits 26-28. If additionally bits 29-30 are 00, it is an unconditional branch to an immediate address. If additionally bit 31 is 1, it is a BL. So your equivalent of looking for opcode E8 on x86 would be to look for 100101 in bits 26-31 of the instruction word.

    Some other groups are harder. For instance all loads and stores have 1 in bit 27 and 0 in bit 25, but the rest of the opcode is scattered through bits 10-11, 16-21, 23-24, 26, and 28-31.

    If you need to recognize instructions more broadly than just needing to identify a couple of specific ones, you may want to make use of an existing disassembly library such as Capstone.