The Intel Architecture's Developer's Manual (Vol3A, Section 8-26), says:
The Pentium processor and more recent processor families use branch-prediction techniques to improve performance by prefetching the destination of a branch instruction before the branch instruction is executed. Consequently, instruction execution is not deterministically serialized when a branch instruction is executed.
What does this mean?
It sounds really, really bad. It sounds like a serializing instruction like CPUID breaks branch prediction (or vice-versa), but that seems unlikely. Can any ASM folks help me understand what "non-deterministic" means in this context.
*Edited for clarity
It's very confusingly worded, but I believe that its actual meaning is simple: "branches do not (necessarily) serialize execution". We take this for granted today, but it was not always so.