Search code examples
assemblyarmuniversal-binaryida

How do you extract values out of ARM ASM bits?


In IDA Pro I see the ARM ASM listed below. What bits is IDA using to get 7200?

A3 F5 E1 53 SUB.W R3, R3 #7200

For convenience the values are binary are as follows

7200 = 0x1c20 = 0001 1100 0010 0000

0xA3F5E153 = 1010 0011 1111 0101 1110 0001 0101 0011

Edit: Load the file in IDA selecting. Mach-O file (DYLIB) ARMv7[macho.lhc]


Solution

  • The ARM ARM (ARM Architectural reference manual) is a good resource, esp for ARM and thumb instruction encoding. For thumb2 though look for the ARMv7-M TRM (technical reference manual) both are free downloads.

    (I know I start this with 0x7200 hex not 7200 decimal, thats okay it all works out in the end).

    A sub r3,r3, #0x7200 is encoded as follows (for ARM).

    e2433c72 sub r3, r3, #29184 ; 0x7200

    the E means always execute

    the upper three bits of the 2 indicate data processing immediate with no other fixed bits the lower bit of the two and upper 3 bits of the 4 are 0010 which means sub. the lower bit of 4 is the s bit meaning update the flags (would be a subs instruction if that bit were set). the next two nibbles 3 and 3 are the two instances of r3 the next 4 bits, c are the rotate field and the lower 8 bits are the immediate.

    the shifter operand is immed_8 rotate_right(rotate_immed *2) so that would be 0x72 rotated right (24 bits), which is the same as rotating left 32-24 bits, so that would make the immediate 0x7200.

    For thumb2 (which is where the sub.w comes from is encoded as follows:

    f5a3 43e4 sub.w r3, r3, #29184 ; 0x7200

    T3 encoding

    0xF1A00000 with or without some bits is the baseline encoding for a SUB.W rd,rd,#const (with a 12 bit immediate, T4 has a 12 bit immed as well).

    The 0x4 bit in 0x5 is the i bit and is set so we need to know that, s bit is not set, it is a sub not subs. the lower three bits in the 0x....4... nibble is the imm3 field, the lower 8 bits are the imm8 field using arm notation our immediate is 1:100:11100100

    Taking the top five of those bits, i, imm3 and the upper imm8 bit 11001

    That means take the bit pattern 11100100 and shift it right 1001 bits

    00000000011100100......

    0000 0000 0111 0010 0......

    and the constant is

    0x00720000 it is off by 256, have to figure that out

    Hmm, I was doing 0x7200 you are doing decimal 7200 which as you mentioned is 0x1C20

    So looking at what your tool is telling you tool is telling you

    A3 F5 E1 53

    We know we need a 0xF5A3 so maybe the other part is swapped too.

    0xF5A353E1

    Which is what you I get:

    f5a3 53e1 sub.w r3, r3, #7200 ; 0x1c20

    same t3 encoding

    0xF5A3 means sub.w something,r3,something with the i bit set 0x53E1 means sub.w r3,r3,something and the const is 1:101:11100001

    the upper 5 bits 11011 which means shift 11100001 right 0b1011 bits which is an 11

    0000000000011100001000

    0000 0000 0001 1100 0010 0000 0000 0000

    0x001C2000

    If you are old enough to know Seinfeld this falls into the yada, yada category.
    A5.3.2 of the armv7-m TRM (Modified immediate constants in thumb instructions).

    01010 they show as having two bits padded (of those five bits mnopq throw out the second one n leaving mopq as the shift amount or 0b0010 in this case).

    and they have a ... other stuff, yada yada, then

    11111 becomes a shift/pad 23 bits 11110 is pad 22 11101 is pad 21, but the area in between is not linear, there is some magic there if we keep working backward 11100 is pad 20, 11011 is pad 19,

    And that is what we were looking for a pad of 19 before the 1 and lower 7 immed8 bits.

    0x00001C20

    So the thumb2 12 bit constant encoding is a bit painful to follow, lots of interesting constants you can optimize for. This particular one where the upper bit of imm3 is set leaves you with 4 bits, or 16 patterns/values. but we have up to 24 bits we might want to pad, we cant get there. Apparently if the i bit is zero you pad down from the top, if the i bit is set you pad from the midway point it looks like.

    So look at the SUB instruction in the armv7 trm. Encoding t3 lines up with what you are trying to do. the description says the thumb shift value is i:imm3:imm8, take those bits to section a.5 of the same manual and look at table a5-1 the T3 encoding called it a const not an imm12, the imm12 encoding looks to be in the pseudo code after that a5-1 table.

    Also note you are not using ARM instructions you are looking at thumb2 instructions. Yes, part of the ARM family but different instruction sets or modes.