assembly x86 machine-code instruction-encoding

Encoding x86-16 instruction with immediate operand

I am venturing into decoding instructions, for now on 80x86 16bit machines, I don't have many problems in decoding instructions that do not have an immediate value as their source operand, the problem arises when the source operand is no longer a register or a location memory but an immediate value. The following instruction I would decode it this way:

mov ax, 3

101110|11| |11|000|000| 00000011 -> 3 with sign expansion
      |       |   |  register AX
      s = 1   |  null
      w = 1   |
              the second operand is a register

Instead it is not fair. this is the right decoding:

mov ax, 3

10111000 00000011 0000000

Can someone explain to me how decoding works assuming the source operand is an immediate value?

Solution

Decoding for the x86 works by consulting tables.

If you were given a byte for which you already know that it is an instruction opcode (and not an instruction prefix) and that byte would hold the value B8h (10111000b) you would see in the table(s) that it stands for mov ax, imm16.

opcode       immediate
-----------  --------------------
1011_1_000b, 00000011b, 00000000b
     ^  ^
     |  AX
     w-bit

In your first snippet, you try to dissect the BBh (10111011b) opcode, but if you would consult the same table(s), you would see that it stands for mov bx, imm16.

opcode       immediate
-----------  --------------------
1011_1_011b, 00000011b, 00000000b
     ^  ^
     |  BX
     w-bit

There is however a second way to encode the mov ax, imm16 instruction using a modr/m byte like you tried to do in your first snippet:

opcode      modr/m       immediate
----------  -----------  --------------------
1100011_1b, 11_000_000b, 00000011b, 00000000b
        ^   ^   ^   ^
        |   |   |   r/m is AX
        |   |   000
        |   mod=11b => r/m is register
        w-bit

This opcode does not have an s-bit; there's no sign extension available. Therefore this encoding is seldom used by assemblers that care about code size.

A similar pair of encodings exist for the ADD, ADC, SUB, SBB, CMP, AND, OR, XOR, and TEST instructions. But for these the short form, the one without the modr/m byte, only applies to the AX register.

You can find all the tables you need in the Intel manuals at https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html