How does the CPU decode variable length instructions correctly?

On most architectures, instructions are all fixed-length. This makes program loading and executing straightforward. On x86/x64, instructions are variable length, so a disassembled program might look like this:

File Type: EXECUTABLE IMAGE

  00401000: 8B 04 24           mov         eax,dword ptr [esp]
  00401003: 83 C4 04           add         esp,4
  00401006: FF 64 24 FC        jmp         dword ptr [esp-4]
  0040100A: 55                 push        ebp
  0040100B: E8 F0 FF FF FF     call        00401000
  00401010: 50                 push        eax
  00401011: 68 00 30 40 00     push        403000h
  00401016: E8 0D 00 00 00     call        00401028
  0040101B: 83 C4 08           add         esp,8
  0040101E: 33 C0              xor         eax,eax
  00401020: 5D                 pop         ebp
  00401021: 83 C4 04           add         esp,4
  00401024: FF 64 24 FC        jmp         dword ptr [esp-4]
  00401028: FF 25 00 20 40 00  jmp         dword ptr ds:[00402000h]

  Summary

        1000 .data
        1000 .rdata
        1000 .reloc
        1000 .text

It seems rather difficult to imagine how the CPU "knows" where one instruction ends and the next one begins. For example, if I add the byte 0x90 (NOP) to the middle of the XOR EAX,EAX opcodes the program then disassembles as:

File Type: EXECUTABLE IMAGE

  00401000: 8B 04 24           mov         eax,dword ptr [esp]
  00401003: 83 C4 04           add         esp,4
  00401006: FF 64 24 FC        jmp         dword ptr [esp-4]
  0040100A: 55                 push        ebp
  0040100B: E8 F0 FF FF FF     call        00401000
  00401010: 50                 push        eax
  00401011: 68 00 30 40 00     push        403000h
  00401016: E8 0D 00 00 00     call        00401028
  0040101B: 83 C4 08           add         esp,8
  0040101E: 33 90 C0 5D 83 C4  xor         edx,dword ptr [eax+C4835DC0h]
  00401024: 04 FF              add         al,0FFh
  00401026: 64 24 FC           and         al,0FCh
  00401029: FF
  0040102A: 25
  0040102B: 00 20              add         byte ptr [eax],ah
  0040102D: 40                 inc         eax

  Summary

    1000 .data
    1000 .rdata
    1000 .reloc
    1000 .text

Which, predictably, crashes when run.

I'm curious exactly what the instruction decoder sees with that extra byte that makes it think the line 0040101E is 6 bytes long, and the line originally at 00401028 is four seperate instructions.

Solution

When fetching an instruction, the CPU first analyses its first byte (the opcode). Sometimes it's sufficient to know the total length of the instruction. Sometimes it tells the CPU to analyse subsequent bytes to determine the length. But all in all, the encoding is not ambiguous.

Yes, the command stream gets screwed up if you insert random bytes in the middle willy-nilly. That's to be expected; not every byte sequence constitutes valid machine code.

Now, about your particular example. The original command was XOR EAX, EAX (33 C0). The encoding of XOR is one of those second byte dependent ones. The first byte - 33 - means XOR. The second byte is the ModR/M byte. It encodes the operands - whether it's a register pair, a register and a memory location, etc. The initial value C0 in 32-bit mode corresponds to operands EAX, EAX. The value 90 that you've inserted corresponds to operands EDX, [EAX+offset], and it means that the ModR/M byte is followed by 32 bits of offset. The next four bytes of the command stream are not interpreted as commands anymore - they're the offset in the mangled XOR command.

So by messing with the second byte, you've turned a 2-byte command into a 6-byte one.

Then the CPU (and the disassembler) resumes fetching after those four. It's in the middle of the ADD ESP, 4 instruction, but the CPU has no way of knowing that. It starts with the 04 byte, the third one in the ADD encoding. The first few bytes at that point still make sense as commands, but since you've ended up in the middle, the original instruction sequence is utterly lost.