Search code examples
assemblyx86cpu-architecturemovinstruction-encoding

Why does the opcode for MOV from a segment register not have its low bit set? It's not 8-bit operand-size, so the W bit should be set


I am having a problem where I can't understand the opcode of this instruction MOV BX,CS like the first byte is 10001100 where the first 6 bit represents the opcode then the direction 1 bit follows then the W. So here is my doubt why in the opcode it is 100011 and not 100010 and in the W cell why there is 0 and not 1 like for what purpose or there is some condition that I should be careful for.

This is the instruction representation

I would really appreciate if someone can give me a detailed answer.

When I tried to solve with more than 1 method I just can't get the solution that is shown in the pic above and I watched a lot of videos but didn't find an answer that clarified my doubt.


Solution

  • Some opcodes don't follow that pattern, including the ones for mov to/from segment registers. There's no 8-bit version of those instructions, so the low bit does not represent Width = 8 vs. 16-bit.

    The low 2 bits being Direction and Width is not universal, it's just a pattern that exists in most normal ALU opcodes that come in pairs for 8-bit and 16/32/64-bit operand-size1, like and, add, cmp, etc. which have both directions, immediate forms with both imm16 and sign-extended imm8 for 16-bit operand-size, and al, imm8 / ax, imm16 short encodings with no ModRM. Also ones like shr that don't have both directions, but do have both sizes.

    Even mov for non segment registers has some special forms, with a mov reg, imm form with the register number as the low 3 bits of the opcode, instead of direction or width. And no mov r/m16, sign_extended_imm8 because 8086 had no use for that. (https://www.felixcloutier.com/x86/mov shows the various mov opcodes.)


    The mov r/m, Sreg opcode you're looking at is 8C. The four opcodes with different values of its low 2 bits include mov Sreg, r/m, but also two unrelated opcodes that have the low bit set, and all exist only in forms with 16-bit operand-size so it doesn't really mean W for any of them. Although it does work for LEA and pop.

    • 8C is mov r/m, Sreg, the opcode you're asking about.

    • 8D is lea

    • 8E is mov Sreg, r/m (which does exist with the d bit set, and works with any segment reg other than CS in the /r field of ModRM. Encoding it with a CS destination is possible, but it will raise a #UD exception on CPUs after 8086. 8086 didn't have an illegal-instruction exception; every bit-pattern ran as something. Some 8086 models even ran mov to CS as a jump.)

    • 8F is pop r/m.

    See http://ref.x86asm.net/coder32.html#x8C for list of instructions by opcode. There are many other exceptions to the pattern, like cli and sti being FA and FB respectively.


    Footnote 1: With 16/32/64-bit operand-size all having the same opcode. The default is implied by the current mode. A 66h prefix sets the opposite of 16 or 32-bit, or a REX.W prefix selects 64-bit.

    There wasn't room for new opcodes when 386 and AMD64 were being designed, so they went with prefixes instead.