Search code examples
assemblyx86intelmachine-coderelative-addressing

Why do call and jump instruction use a displacement relative to the next instruction, not current?


In the table below extracted from Intel's docs we have for the opcodes E8 cw and E8 cd that the displacement is relative to the next instruction.

Why the next instruction? Why isn't the displacement relative to the call instruction itself?

enter image description here


Solution

  • TL:DR: you find the end of the instruction during decode anyway, and to set up decode of the next instruction. It's very normal for CPUs to do relative addressing relative to the end of the current instruction, although some CPUs make different choices like relative to end of next instruction (for ARM PC-relative memory addressing).

    See Does Program Counter hold current address or the address of the next instruction?


    x86 machine-code design was set in stone in the late 70's with 8086, except for things (like 32/64-bit ModRM+SIB addressing modes) that got redesigned when extending the ISA.

    Original 8086 decoded instruction bytes sequentially (not necessarily a whole instruction at once), and had no upper limit on number of prefix bytes or total instruction length.

    I think 8086 avoided ever needing to save the starting address of an instruction, even for exceptions. For example, on modern x86 #DE (divide exception) pushes the address of the faulting instruction. But on 8086 the exception frame has the address of the next instruction.

    8086 even has a "bug" (or documented design flaw) where interrupts that arrive during execution of cs rep movsb (for example) push the address of the final prefix as the exception-return address, making segment overrides on rep-string instructions essentially unusuable with interrupts enabled. (Because execution will resume either without the rep or without the segment override, whichver you put first). See x86 Program Counter abstracted from microarchitecture? and comments.


    When 8086 finishes decoding a call instruction, it doesn't know where it started. The only reference point it has is the end of the call instruction. So if they wanted to do that optimization in hardware (of not keeping the decode-start address anywhere), they didn't even have a choice. Although they could in theory have used the address of the E8 call opcode (after any prefixes) as the anchor, that would probably have taken an extra adder or extra hardware to record that separately.

    Fetch/decode already has to find the end of the instruction during decode (while figuring out that it's a call or jmp), so the end-of-instruction / address-of-next-instruction is already internally available. call even has to push that value on the stack as the return address.

    A pipelined RISC, or a completely unpipelined CPU, would also use that next-instruction address for fetch of the next instruction from memory or I-cache. But in practice 8086 prefetch is asynchronous into a small prefetch buffer. The machine-code format was designed on paper mostly before design of the implementation so this common reason for making things relative to the end of an instruction may have been what the architect had in mind.

    It's a common design choice for many ISAs to make branches relative to the end of the instruction.


    Just to reiterate, the reason I'm talking only about 8086 (which is internally very different from modern x86) is that it was the first gen, and understanding it helps explain some machine-code design decisions. (e.g. why x86 spends 8 opcodes on single-byte xchg [e/r]ax, reg : because 8086 didn't have movsx or 2-operand imul, and needed or wanted AX for a lot of stuff. Also that code-size was the major bottleneck for performance.)

    Modern x86 just keeps track of every instruction's address and can use that when decoding call rel32. Not a big deal. Why do x86 jump/call instructions use relative displacements instead of absolute destinations?