Search code examples
mips

Determining the amount of stalls needed for MIPS program


Consider the following instructions:

addi $t0, $zero, 15

mystery:
addi $t0, $t0, -5
bne $t0, $zero, mystery

boo:
addi $t0, $t0, 5
beq $t0, $zero, boo
lw $t0, 4($t1)

The above MIPS instructions executed by the 5-stage pipeline datapath with forwarding, with untaken branch prediction, and decision making is moved to ID stage.

I got this question in a practice and I still don't fully understand it. How do I find the amount of stalls needed in clock cycles.

I got the total amount of instructions as 10 and the ideal pipeline as 14, but I don't understand the full concept of stall any help would be appreciated

I've been spitballing at this point, I tried just stalling at the branch operations but I just can't seem to grasp the concept of stall.


Solution

  • For conditional branches, taken branches & not-taken branches have different stall requirements.

    So, you need to identify how many taken branches vs. not taken branches in the complete code execution.

    And there are only 2 taken branches (backward in mystery loop which runs 3x, meaning jumps backward twice) — and otherwise 2 not-taken branches, one which exits each loop.

    Untaken branches are said to be "predicted".  That is informative though isn't very complex, it just means the processor just goes along forward assuming the branch isn't taken, i.e. as if it weren't even a branch instruction, e.g. as if it were a nop.  And it will run full speed if that "prediction" is correct.

    However, for taken branches then, it has to nullify any work along the predicted not-taken path since that path is incorrect for the program.  That means it will loose any cycles that were put into those instruction on the wrong path.

    In addition, for taken branches, here there are data hazards.  Since in your code snippet, both branch instructions suffer RAW data hazard, understanding that will inform how many instructions are consumed along the wrong path, e.g. how many wasted cycles will have to be discarded, and these will be called stalls.

    The result of addi is known immediately after the addi's EX stage has completed.

    And the branch instruction is said to be able to determine whether it was correct in its prediction in its ID stage, then it will (attempt to) decide whether to continue at full speed or cancel the incorrect mispredicted not-taken code path.

    However, the ID stage for the branch instruction (where it would prefer to make the taken/not taken decision) overlaps with the EX stage of the prior addi instruction, so the actual determination of the branch prediction simply cannot happen before or during addi's EX, so would have to happen in the cycle after addi's EX.

    Looking at these instructions:

    mystery:
    addi $t0, $t0, -5
    bne $t0, $zero, mystery
    
    boo:
    addi $t0, $t0, 5
    beq $t0, $zero, boo
    
    Cycle IF ID EX MEM WB
    1 addi -5
    2 bne addi -5
    3 addi +5 bne addi -5
    4 beq addi +5 bne addi -5

    We can see that the bne cannot make a final determination as it might prefer, due to the RAW hazard on $t0 with the immediately preceding instruction, so it has to wait until the addi -5 has completed EX and thus the next cycle, when the bne is in EX itself.

    Thus, two instructions along the mis-predicted non-taken path have been entered into the pipeline and they must both be cancelled.  The redirection has to cancel those instructions (on the incorrect path) and reset the next PC to the proper (taken) branch target.  This will occur in cycle 4, so by cycle 5 the processor should be on the proper code path, having lost 2 cycles (to the not-taken code path, addi +5 and beq), which we can refer to as stalls.


    Let's note that were it not for the RAW data hazard, the processor would have been able to make the final branch path determination in the bne's actual ID stage; thus, having consumed only one instruction on the incorrect path (the addi +5) before getting back on track with the proper path.

    And to reiterate, not-taken branches run like nop or other arithmetic instructions, full speed (i.e. no stalls).