Search code examples
assemblyarmcpu-architecture

Why is the controller latency not accounted for in this question?


enter image description here

The proposed answer is (a.):

a. 30(PC Read)+250(IM)+25(Mux)+150(RF)+25(MUX)+200(ALU)+25(mux)+20(Setup) = 725 ps

b. 30+250+25+150+25+200+250+25+20= 975 ps

c. 30+250+25+150+25+200+250=930 ps

d. 30+250+25+150+25+200+5+5+25+20=735 ps

e. 30+250+50+150+25+20=525 ps

f. 30+250+25+150+25+200+25+20=725ps

g. 975 ps

Datapath

Datapath

As you can see on the proposed answer the latency of the controller is never accounted for. Similarly, The latency of the sign extend isn't also accounted for in part "f".

My solution for part "a" of the question would be exactly the same as the proposed answer but I would add 50 for the controller, and for part "f" I add also 50 for the sign extend.

So is the proposed answer correct? Or am I?


Solution

  • You can compute the timing for every subcomponent, working mostly left to right.  From that you can compute the maximum time needed for any cycle, or evaluate with a constraint like consider R-Type instructions only.

    The approach to computing the timing of a subcomponent is:

    • endTime = startTime + latency, and,
    • startTime = max(all the endTime's of the inputs to a components)

    We have to consider the back edges in the graph specially.  They are register writes that require setup time but do not count in the list of inputs to a component.  There are two back edges in this diagram one is for writing to the PC and the other for writing to the register file.  (Both of them stand out from other datapaths as these use arrows/lines that go right to left forming loops.)  Both of these occur at exactly the end of the cycle as the rising clock edge triggers the capture of data from the back edge into the register.

    Since the PC subcomponent has no inputs at the start of this cycle, we can identify its start time at 0.  Then its latency is 30ps (read register) so its endTime is 0+30ps.

    Given that, we can determine the endTime of the Instruction Memory and the PC+4 adder by using 30ps as their startTime and adding their respective latencies.

    The pc+4 adder gets good input at 30ps, so that's its startTime, and its latency is 150ps; it generates a good output at 180ps.

    Similarly, the mux in front of the register file gets good input at 30ps, and its latency is 25ps; thus that mux offers good output by 55ps.

    The register file has three inputs relevant here, attached to components having endTimes of 30ps, 55ps, and 30ps, respectively; thus startTime for the register file component is 55ps...

    And so on.  As you go further, you'll see that the latency of some quicker components is hidden by the latency of other slower components or paths.