Why does MIPS use one delay slot instead of two?

This seems to be the case in many RISC architectures. Since filling one delay slot saves us 50% of otherwise wasted cycles, why not give the programmer a chance to use both slots?

Solution

On MIPS R2000, the classic MIPS I that the ISA is designed around, 1 branch-delay slot is enough to hide branch latency: How does MIPS I forward from EX to ID for branches without stalling?. Being able to check branch conditions with low latency (half a clock cycle) is why MIPS conditional branches are limited to eq/ne and/or checking the sign bit for x<0, not an arbitrary x<y.

On Is that true if we can always fill the delay slot there is no need for branch prediction? Paul Clayton answers that yes, filling the branch-delay slot in asm makes branch-prediction useless on early MIPS. So that's more evidence that real commercial MIPS R2000 worked that way. You only need branch prediction if you lengthen the pipeline, like they did for next-gen MIPS R4000. And then you have branch prediction instead of expecting compilers to statically fill the branch latency.

But anyway, one branch-delay slot is hard enough for compilers to fill reliably, and in hindsight is baggage for future implementations that use branch prediction instead of depending on a delay slot to hide branch latency. Especially superscalar implementations.

Making it architectural burdens all future CPUs with it, if they want to be binary-compatible. So even if early MIPS CPUs did have 2 cycles of branch latency, some foresight and cost/benefit consideration makes is pretty reasonable to only expose 1 cycle architecturally.