Search code examples
assemblyx86intelcpu-architecturemicro-architecture

Intel JCC Erratum - should JCC really be treated separately?


Intel pushed microcode update to fix error called "Jump Conditional Code (JCC) Erratum". The update microcode caused some operation to be inefficient due to disabling putting code to ICache under certain conditions.

Published document, titled Mitigations for Jump Conditional Code Erratum lists not only JCC, it lists: unconditional jumps, conditional jumps, macro-fused conditional jumps, calls, and return.

MSVC switch /QIntel-jcc-erratum documentation mentions:

Under /QIntel-jcc-erratum, the compiler detects jump and macro-fused jump instructions that cross or end on a 32-byte boundary.

The questions are:

  • Are there reasons to treat JCC separately from other jumps?
  • Are there reasons to treat macro-fused JCC mentioned separately from other JCC?

Solution

  • Macro-fused jumps have to be mentioned separately because it means the whole cmp/jcc or whatever is vulnerable to this slowdown if the cmp touches the boundary when the jcc itself doesn't. Because the uop cache would have a single uop for both those x86 machine instructions together, with the start address of the non-jump instruction.

    If everyone only said "jumps", you'd expect that only the JCC / JMP / CALL / RET itself had to avoid touching a 32B boundary. So it's a good thing to highlight the interaction with macro-fusion.


    This slowdown (for all jumps) is the result of a microcode mitigation / workaround for a hardware design flaw. Not being able to uop-cache cache jumps that touch a 32-byte boundary is not the original erratum, it's a side effect of the cure.

    That original erratum description doesn't say anything about affecting only conditional branches. Even if it was only conditional branches that were a real problem, perhaps the best way Intel could find to make it safe with a microcode update unfortunately affected all jumps.

    For example, in Skylake-Xeon (SKX), the original erratum is documented as SKX102 in Intel's "spec update" errata list for that uarch:

    SKX102. Processor May Behave Unpredictably on Complex Sequence of Conditions Which Involve Branches That Cross 64 Byte Boundaries

    Problem: Under complex micro-architectural conditions involving branch instructions bytes that span multiple 64 byte boundaries (cross cache line), unpredictable system behavior may occur.

    Implication: When this erratum occurs, the system may behave unpredictably.

    Workaround: It is possible for BIOS to contain a workaround for this erratum. [i.e. a microcode update]

    Status: No fix.


    I suspect the "JCC erratum" name caught on because most branches in "hot" code paths are conditional. Compilers can usually avoid putting unconditional taken branches in the fast path. So it's likely that people noticed the performance problem with JCC instructions first, and that name simply stuck even though it's not accurate.

    BTW, 32-byte aligned routine does not fit the uops cache has a screenshot of the relevant diagram from the Intel PDF you linked about, and some other links and details about performance effects.