Search code examples
assemblyx86-64machine-codeinstruction-encoding

x64 instruction encoding (r/m, reg vs reg, r/m)


What's the difference in encoding (ModRM:r/m, ModRM:reg) vs (ModRM:reg, ModRM:r/m)? Specifically say an instruction like CMPXCHG vs DIVPD. I thought the register and address was always encoded in the first byte and then the SIB and displacement in the second byte if needed? Here's my code:

    static void WriteRegisterToMemory(ICollection<Byte> bytes, IRegisterToMemoryInstruction instruction, Byte rex)
    {
        IAddress address = instruction.Address;
        Byte register = instruction.Register;

        if (address.NeedsRex)
        {
            rex |= 0x40;
            if (address.RexB)
                rex |= 1;
            if (address.RexX)
                rex |= 1 << 1;
        }

        if (register > 7)
            rex |= 0x44;        // REX.R
        if (rex != 0)
            bytes.Add(rex);

        bytes.AddRange(instruction.Opcode);
        Byte modRM = (Byte)((register % 8) << 3);
        modRM |= address.GetModRMAddressByte();
        bytes.Add(modRM);
        address.WriteScaledIndexByteAndDisplacement(bytes);
    }

So like these two instructions are encoded exactly the same with just different opcodes? (ADDs on page 457 of the intel x64 manual)

Op/En Operand 1        Operand 2 
RM    ModRM:reg (r, w) ModRM:r/m (r) 
MR    ModRM:r/m (r, w) ModRM:reg (r)

Solution

  • There isn't really any difference w.r.t. encoding, the difference is in which one is the source and which one is the destination. Most instructions have the r/m as source, except things like cmpxchg, bts, xadd, xchg is ambiguous about it (it's symmetric), ALU ops have an r/m, r form and an r/m, imm form, and obviously mov's to memory. So in encoding those instructions (even if both operands are registers), be careful "which way around" they are, or they might end up with their operands swapped. But that's all, there is in the end no difference in how they are encoded.