How to draw data dependency waits when drawing a 5 stage pipeline diagram?

How to draw data dependency waits when drawing a 5 stage pipeline diagram? Is my answer correct?

Instructions

1. ADD X, Y, Y
2. ADD Z, Y, X
3. SUB V, X, W
4. ADD Z, Z, V

Here is the Pipeline diagram

Is my pipeline diagram correct?

Solution

I assume that the instruction format is op <dest>, <src1>, <src2>

We can track the dependency explicitly

1. ADD X, Y, Y    Depends on: Y     Produces: X
2. ADD Z, Y, X    Depends on: Y, X  Produces: Z
3. SUB V, X, W    Depends on: X, W  Produces: V
4. ADD Z, Z, V    Depends on: Z, V  Produces: Z

Knowing that there is no operand forwarding, and instruction i can read a result of an instruction j only after j completes the WR stage.

ADD X, Y, Y depends on Y but no instruction produces it, it is already ready. Thus it can start executing just after being fetched and decoded.
So it starts on cycle 1 and ends on cycle 5

ADD Z, Y, X is fetched as soon as the fetch unit is free, this happen at cycle 2. It is then decode as soon as the decode unit is free after that, at cycle 3.
It cannot be executed right after, on cycle 4, because it depends on Y and X, the latter being produced by instruction 1 and thus available only at cycle 6.
Note here that the stage that is stalling is not IF but ID. It is this stage that is prevented from moving data on the next one. Imagine there is a latch between each stage.
IF can starts fetching the new instruction on the next cycle but it cannot complete (i.e. must be stalled too) until ID is allowed to write its output (otherwise the input of ID would change and so would its output).

SUB V, X, W can be fetched at cycle 3 because the fetching of instruction 2 ends at cycle 3 and so the fetch unit is free.
However the fetch unit cannot write the result in the latch between itself and the decode unit because the decode unit is stalled and must keep its current input value (or the decode would change).
So the fetch must be stalled too, until the decode unit writes to its output latch at cycle 6.
The decode can thus start at cycle 6.
The instruction depends on X and W, the former is available at cycle 6 and the latter since cycle 1; thus the EX can starts just after ID, i.e. at cycle 7.

ADD Z, Z, V can start being fetched at cycle 6, when the fetch unit had written its result to the latch to the decode unit.
The execution can start only at cycle 10 because this instruction depends on V that is produced by instruction 3 at cycle 10.
The dependency on Z is not a problem since this data is available at cycle 9.