Search code examples
assemblyx86instruction-encoding

Give the opcode and number of bytes of code for each of these instructions


I am currently taking a low-level programming class and unfortunately, I cannot find this information in our text book (most of the questions are not contained in our text). I am having a hard time determining how to solve these. There is no information as to "CPU or anything along those lines". The question is as follows:

Give the op code and number of bytes of code for each of these instructions.
(a) mov exc, 984 Op Code_____ Number of Bytes_____
(b) xchg eax, ecx Op Code_____ Number of Bytes_____

There are a lot more, but if anyone could help me understand how to do these two, I hope to be able to translate that to the other questions.

One thing I tried, but not sure what I am seeing: I created a small *.asm project in Visual Studio 2019 and added that one line of code in (a) and looked at the listing file. I see it shows the following:

00000000  B9 000003D8         mov  ecx, 984   

I am not sure if the information is to be found here and I am just missing it, or there is some other way to figure this out.


Solution

  • Whenever you have questions about the encoding of instructions, check instruction references, like AMD's or Intel's Manuals. Specifically, Volume 2 of Intel's manual applies here. A web-browsable version like https://www.felixcloutier.com/x86/ is scraped from that PDF.

    The full PDFs have intro chapters that explain the notation used in entries for individual instruction. Related Q&As about that:


    For xchg eax, ecx we check the XCHG section in the manual. In the table there the instruction we want is XCHG EAX, r32. It's encoded as 90+rd (90 here is hexadecimal), where rd is a code that designates which double-word register is used.

    Looking earlier in the manual (right at the beginning of the Instruction Set Reference chapters in the full PDF), we find the definition of +rd, and see that ECX has a value of 1. The compact single-byte encoding of xchg eax, ecx is therefore 91 (again, hex).

    It is also possible to assemble this as 2 bytes, which is what one specific online assembler did for me, but the fact that one of the operands is EAX allows for the 1-byte version.


    For mov ecx, 984 (I'm assuming exc is a typo) we check the MOV section, and find the instruction as MOV r32, imm32 in the table there, encoded as B8+rd id. From the other one, we already know that the rd for ECX is 1, so the first byte is B9.

    Then we have id, and checking the same section where we found +rd, we know that that's a 4-byte immediate signed operand, given as low-order byte first (little endian). Converting 984 from decimal to hexadecimal, we get 3D8. Encoded in 4 bytes as little endian, this is D8 03 00 00.

    Putting it together, the encoded instruction is B9 D8 03 00 00.


    (Fun fact: x86 registers are numbered in EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI. The first 4 are not in alphabetic for what are probably obscure historical reasons.)


    You can verify these by using any assembler, like NASM, and asking it to make a "listing". Like nasm -felf32 foo.asm -l /dev/stdout to print a listing on the terminal.