Search code examples
assemblypipelinecpu-architectureaccumulator

Is it possible to implement pipelining in a accumulator based machine/ISA?


I am not sure if this is the correct place to ask this but here it goes:

I was wondering if it is possible to implement pipelining stages into a Von-Neuman Architecture that uses an accumulator to hold a value along with a PC, memory buffer register, instruction register, and memory address register? There would also be a output register to hold the output and a input register to hold input.

I was wondering this and was thinking that a 3-stage pipeline(Fetch,Decode,Execute) would be more feasible than a 5-stage pipeline as that would introduce the need of extra registers.

Are there any examples of this and is it possible to implement theoretically?


Solution

  • Sure, of course, at least pipelining fetch/decode, and probably data-load would be extra helpful since every instruction will have a data memory address embedded in it.

    (Some instructions might allow a memory-indirect addressing mode, i.e. load a pointer from memory and then dereference it, allowing indirect addressing without self-modifying code. An in-order pipeline would probably have to stall while doing the 2nd load. Self-modifying code sucks a lot for pipelining and especially OoO exec, if you want to keep the pipeline coherent. You could keep it simple and only guarantee coherence after a jump or something, and discard fetch/decode results on jumps to make sure you're observing newly-stored instructions, if you want to support self-modifying code like some toy accumulator ISAs need to e.g. loop over an array. (e.g. Little Man Computer) )

    Register-renaming with out-of-order execution would probably be valuable, given that software has only 1 architectural register to play with. A store buffer with store forwarding would provide the equivalent for memory / cache, given effective memory disambiguation (store-forwarding detection). See this Q&A for more about what store buffers do, and links from there including https://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/.

    Note that modern x86 CPUs are able to pipeline instructions like add eax, [mem] memory-source add into the accumulator, and mov [mem], eax stores of the accumulator. (x86 has other registers, but you could in theory use it with just the one). Modern x86 CPUs decode memory-source ALU instructions to 2 uops, the load and the add, which execute separately in the out-of-order back-end. See Modern Microprocessors A 90-Minute Guide! for a gentle intro that builds up to more complex CPUs, including modern x86 decoding to "RISC-like" micro-ops.

    You could build a pipeline pretty much exactly like Intel Sandybridge-family (see David Kanter's deep dive with block diagrams of different parts of the front and back ends in Haswell - https://www.realworldtech.com/haswell-cpu/) or AMD Zen, that runs an accumulator ISA instead of x86.

    Or to keep it a bit simpler, like P5 Pentium (dual issue in-order without decoding to RISC-like uops, so it can't pipeline memory-source ALU instructions as well), or 486 (in-order pipelined single issue).


    I doubt there are any commercial examples; AFAIK no pure accumulator ISAs are relevant enough that anyone would want to buy a high-performance implementation of an inherently inefficient ISA, instead of buying a CPU that could run a register ISA faster for the same cost in dollars, power, silicon, and design effort.

    (i.e. you could do this, or you could spend similar effort to pipeline a good ISA like ARM, MIPS, or RISC-V. Or even something more CISCish like m68k.)


    No reason to assume a toy microarchitecture with a single MAR/MBR/IR, though. Each instruction needs its own IR through the pipeline, assuming it's an ISA with fixed-width instructions that even makes sense to have an IR, rather than control signals based on decode results.

    x86 registers: MBR/MDR and instruction registers explains why real-world x86 CPUs don't have just one of those, and don't have an "IR" at all. And that a MAR/MBR is too simple a model for pipelined cache access, especially on a CPU with virtual memory.