Search code examples
assemblyx86x86-64instruction-encoding

How to encode an instruction when we just know the hex for opcode


In this source, they have given that hex for cmp r/m16/32 imm8 is 0x837. Somewhere i got that, hex for ebp is 0b0101. With this information, how can i encode the instruction cmp dword [ebp-4] 2? I have been searching for this from a couple of hours and got no clues except for this(page 61) similar instruction cmpb $0xf,(%rdi) encoded as 80 3f 0f. But I'm unable to understand this as the former source I mentioned says that 0x803 is for sbb. Also rdi is 0b0111 and not 0b1111 (which is r15 indeed). I'm getting confused... If possible, I want encoded instruction for both x86-32 and x86-64.


Solution

  • The documentation you linked is not very useful unless you already know how the x86 instruction encoding works. So let's try another one. Following this resource (scraped from Intel's official PDF manuals1) we get the following candidate encodings:

    81 /7 iw    CMP r/m16,imm16     Compare imm16 with r/m16.
    81 /7 id    CMP r/m32,imm32     Compare imm32 with r/m32.
    83 /7 ib    CMP r/m16,imm8      Compare imm8 with r/m16.
    83 /7 ib    CMP r/m32,imm8      Compare imm8 with r/m32.
    

    One thing to note here is that the same encoding is given both for word and dword operation. This is not a mistake: the operand size is determined by the current code segment's default operand width (i.e. whether we are running in 16, 32, or 64 bit mode) and the presence of a 66 or REX.W operand size override prefix. The rules are simple:

    • in 16 bit mode, the default operand size is 16 bit
    • in 32 and 64 bit mode, the default operand size is 32 bit
    • the 66 prefix switches between 16 and 32 bit operand size
    • in 64 bit mode, the REX.W prefix switches to 64 bit operand size

    So when programming in 32 or 64 bit mode, no prefix is needed as the default operand size is already the one we want.

    Now the question is whether to use 83 or 81. In this case, both can be used since our immediate fits into 8 bits signed. We'll go ahead with the 83 opcode since the encoding is shorter.

    The encoding 83 /7 ib tells us that the opcode is 83 followed by a modr/m byte with reg = 7 (the other fields encode the r/m32 operand) followed by an 8 bit immediate.

    The r/m32 operand [ebp-4] can be looked up in the modr/m byte table given in the reference you linked. We have a memory operand with indexed addressing mode; index ebp and displacement -4. The displacement fits into 8 bits signed, so we use the [ebp+disp8] entry in the table and end up at 7d for the modr/m byte. Following this comes the displacement byte, 0xfc for −4 in two's complement.

    Putting it all together we get 83 7d fc 02 as the encoding for cmp dword ptr [ebp-4], 2:

    83  opcode
    7d  modr/m byte: reg = 7, r/m = [ebp+disp8]
    fc  displacement: -4
    02  immediate: 2
    

    Notably, the encoding is the same for 32 and 64 bit modes. For 16 bit modes, extra 66 and 67 prefixes are needed to select a 32 bit operand and address size, giving 66 67 83 7d fc 02.


    Footnote 1: In Intel's Software Developer Manuals (SDM), the operand size attribute of instructions is described in volume 1 ch. 3.6, and the 66h prefix to select the non-default one for the mode is described in volume 2 ch. 2.1.1 Group 3.

    Other encoding details are mostly in the vol.2 manual. Sites like https://www.felixcloutier.com/x86/ and https://c9x.me/x86/ are scraped from Intel's vol.2 PDF, but only include the entries for each instruction, not the intro stuff that tells you the basics that apply to every instruction, or how to read the details of the entries. See also How to read the Intel Opcode notation