Multicycle implementation of MIPS ISA

I'm trying to understand how the MIPS multicycle implementation works. Apparently, we need temporary registers to store the results of memory reads, register reads, and the ALU. However, I am struggling to figure out why. All I know is that it is because the data will be lost in the next cycle. I am trying to figure out why that is the case. In the case of registers A and B whose contents will be the data read from rs and rt, won't their values be the same as long as IR has the same value? Moreover, what happens if the memory data register (MDR) isn't in the circuit? Won't that be okay? Also, I'd like to confirm that we need the instruction register (IR) because we don't want to accidentally execute data to be loaded as an instruction? In addition, ALUOut is necessary because we want to have PC+4+offset before it gets overwritten by the ALU result for the branch instruction? I am probably wrong, so some feedback would be great. Thanks in advance.

Solution

Let's start with an analogy. The way programs in imperative-style languages work is to break necessary work into statements, and those statements are then interconnected by variables.

In this analogy, imagine you want to compute k = 2i+j, but break the computation into two operations/statements/cycles: first compute 2i, then +j — but where to store the intermediate result of 2i? The answer is in some internal & intermediate storage, and certainly not in any program variable, as wiping one of those out would be bad for the program. So, we might compute t0 = 2i; then k = t0 + j; where t0 is extra, hidden, internal storage and thus, does not conflict with any variables.

Breaking an instruction's execution into multiple cycles necessarily operates in terms of such intermediate results as output from one cycle and input to another, the same way that program statements are interconnected, sometimes by temporary variables.

The temporary registers involved in a multi-cycle or pipelined processor are for internal & intermediate results that have to do with the progress of the instruction in the prior cycle and how that that progress is communicated to the next cycle, which is as state.

Dynamically speaking, there is a lot of state involved in a single cycle processor: control signals, decoded values, sign extended values, alu results, all above and beyond the architectural register file. But that state does not need to be stored anywhere, since it simply propagates across the processor during the cycle — and in the end the architectural registers (reg file & PC) are updated, so the next instruction can start with just the architecturally visible state.

However, when we split execution into piece parts as with multi-cycle or pipelined processors, that extra, internal intermediate state that would have propagated across the single cycle processor needs to be captured for the next cycle to start with.

For example, in an add R-Type instruction, extraction of the rs & rt & rd register numbers from the instruction as well as register read of rs & rt is usually done in a decode cycle. Later phases need those register values, and if they had to go back to the register file to get them, that would cost some valuable time in the cycle. Those rs & rt values needed to reread the register file could also obtained by re-decoding the instruction, but if you follow that logic back to instruction fetch, you've basically got a processor that does all the work from scratch in the last cycle — might as well have a single cycle processor.

So, these intermediate cycle or stage registers are for holding the results that the next cycle needs, so that the next cycle can get started with its work at the very start of the cycle, without repeating work done by previous cycles.

The target register number (sometimes rd and sometimes rt) is decoded in a decode cycle/stage and then later used in a write back cycle/stage. This is yet another example of the intermediate state (the number for the target register) that needs to be forwarded from one cycle to another in order for that instruction's execution to proceed in later cycles without redoing work previously completed.

In a multicycle processor, if an instruction register is needed, it would likely be in between fetch and decode, and it might be further forwarded to other cycles/stages, but more likely that certain control signals and instruction fields are individually forwarded rather than the whole instruction being forwarded and available for reinterpretation.