Search code examples
assemblyintelx86-16disassemblypipelining

Is this intel pipelining instruction?


In my knowledge, intel 8086 pipelining is technique
that fetching the next instruction when the present instruction is being executed.

This article said that one of Advantage of pipelining is
eliminates the waiting time of EU and speeds up the processing.

I think, instructions like lea 0x7(%eax), %ecx can be split into several instructions,
like add $0x7, %eax; lea %eax, %ecx.

My think)
So, by definition,
I think above example matches the definition of intel 8086 pipelining
because it executes several instructions at one time slot and
so operation like this speeds up the processing.

Question)
I'm curious about the below instructions can be the example of pipelining.

main:
 mov $0x2, %eax
 mov $0x3, %esi
 lea (%eax), %ecx           # result: 2. Pipeling?
 lea 0x7(%eax), %ecx        # result: 9. Pipeling?
 lea 0x7(%eax,%esi,), %ecx  # result: 12. Pipeling?
 lea 0x7(,%esi,4), %ecx     # result: 19. Pipeling?
 lea 0x7(%eax,%esi,4), %ecx # result: 21. Pipeling?

Solution

  • The very first computers were not pipelined. They fetched an instruction, executed all the cycles required by the instructions and passed to the next instruction. On the average an instruction required 5-6 cycles. This behavior was true for all computers up to, mid 80's, including the 8086 (introduced in 78).

    In the late seventies, people discovered that pipeline was an efficient way to improve efficiency. The first commercial chip was IBM 801, but success of pipeline was with Sun Sparc, Berkeley Risc and MIPS, in the mid eighties.

    The idea is to split all instructions in similar stages, and to associate this stages to independent HW resources, in such a way that you can start a new instruction, without waiting for the previous one to complete, allowing to start a new instruction per cycle. To deal with instruction interaction (hazards), it is more 1 instruction every ~1.5 cycles, but the gain vs the previous generation is huge (X3 performances).

    The problem is that pipelining requires specific instructions sets, based on simple data movement (called RISC instruction sets). New computers were based on this scheme, but older processors instruction sets were not adapted (including x86).

    To profit from the pipeline benefit, while keeping upward compatibility, intel decided to have a microarchitecture based on two part: the first one fetches x86 instructions and translates them to pipelinable ones (called μOps) and the second one is a pipeline execution of these μOps. This was first introduced with Pentium Pro (introduced in 1995) and is present in any successive versions.

    The code that you give

    lea 0x7(%eax), %ecx

    translated to

    add $0x7, %eax;
    lea %eax, %ecx

    is an example of μOps translation. Translation is peculiarly useful do convert ALU operations with an in-memory operand to simpler operations performing a single task (memory transfer or ALU op).

    Presently all computers are pipelined (except some simple μcontrolers or some processors embedded in FPGA).

    Whatever sequence of instructions that you give will be executed in pipeline, including of course the ones in your question. The only limitation is that depending on instruction interactions, there can be hazards that may imply a slowdown (stall) of the pipeline.

    I think above example matches the definition of intel 8086 pipelining

    Pipeline is a characteristic of a microarchitecture, not of an instruction set. So the microarchitecture 8086 was NOT pipeline, but subsequent architectural implementations of its instruction set (called x86 or IA32) are.

    because it executes several instructions at one time slot

    You are actually right and several instructions can be started at one time slot, but this is another technique above pipeline allowing parallel execution at the instruction level and called superscalar.