Search code examples
cpucpu-architectureclockcpu-registerssystem-clock

Why does the single cycle processor not incur register latency on both read and write?


I wonder why last register write latency(200) is not added?

To be more precise, critical path is determined by load instruction's latency, so then why critical path is not

I-Mem + Regs + Mux + ALU + D-Mem + MUX + Regs

but is actually

I-Mem + Regs + Mux + ALU + D-Mem + MUX


Background

Figure 4.2

In the following three problems, assume that we are starting with a datapath from Figure 4.2, where I-Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10, 100, 200, 2000, and 500, respectively.

And I find solution like below

Cycle Time Without improvement = I-Mem + Regs + Mux + ALU + D-Mem + Mux = 400 + 200 + 30 + 120 + 350 + 30 = 1130

Cycle Time With improvement = 1130 + 300 = 1430


Solution

  • It is a good question as to whether it requires two Regs latencies.

    The register write is a capture of the output of one cycle.  It happens at the end of one clock cycle, and the start of the next — it is the clock cycle edge/transition to the next cycle that causes the capture.

    In one sense, the written output of one instruction effectively happens in parallel with the early operations of the next instruction, including the register reads, with the only requirement for this overlap being that the next instruction must be able to read the output of the prior instruction instead of a stale register value.  And this is possible because the written data was already available at the very top/beginning of the current cycle (before the transition, in fact).

    The PC works the same: at the clock transition from one cycle's end to another cycle's start, the value for the new PC is captured and then released to the I Mem.  So, the read and write effectively happen in parallel, with the only requirement then being that the read value sent to I Mem is the new one.

    This is the fundamental way that cycles work: enregistered values start a cycle, then combinational logic computes new values that are captured at the end of the cycle (aka the start of the next cycle) and form the program state available for the start of the new cycle.  So one cycle does state -> processing -> (new) state and then the cycle repeats.


    In the case of the PC, you might ask why we need a register at all?

    (For the 32 CPU registers it is obvious that they are needed to provide the machine code program with lasting values (one instruction outputs register, say, $a0 and that register may be used many instructions later, or maybe even used many times before being changed.))

    But one could speculate what might happen without a PC register (and the clocking that dictates its capture), and the answer there is that we don't want the PC to change until the instruction is completed, which is dictated by the clock.  Without the clock and the register, the PC could run ahead of the rest of the design, since much of the PC computation is not on the critical path (this would cause instability of the design).  But as we want the PC to hold stable for the whole clock cycle, and change only when the clock says the instruction is over, a register is use (and the clocked update of it).