Search code examples
assemblyx86cpu-architectureamd-processor

Assembly instructions showing how zenbleed was found


While looking at this zenbleed article, it was found that a randomly generated sequence of instructions and the same sequence but with randomized alignment, serialization and speculation fences added produced final states that didn't match.

For example

Original Code              Fuzzed Code
--------------------       --------------------
movnti [rbp+0x0],ebx       movnti [rbp+0x0],ebx
                           sfence
rcr dh,1                   rcr dh,1
                           lfence
sub r10, rax               sub r10, rax
                           mfence
rol rbx, cl                rol rbx, cl
                           nop
xor edi,[rbp-0x57]         xor edi,[rbp-0x57]

It was mentioned in that article that it could indicate a bug

If the final states don’t match, then there must have been some error in how they were executed micro-architecturally - that could indicate a bug.

Notes

As developers we monitor the macro-architectural state, that’s just things like register values. There is also the micro-architectural state which is mostly invisible to us, like the branch predictor, out-of-order execution state and the instruction pipeline.

Question

Are there any situations when it's not a bug when executed micro-architecturally?


Solution

  • when executed micro-architecturally

    This phrasing doesn't make sense. Every instruction sequence has to get executed by the microarchitecture (CPU hardware design). The CPU hardware doesn't have any other way to run machine code.

    Any observable (architectural, not timing) results always need to match what would happen if they executed one at a time, in program order (except for memory contents observed from other threads). i.e. out-of-order exec has to preserve the illusion of a serial execution model.

    Since lfence, mfence, nop, etc. have no effect on the architectural state (register / memory contents), they shouldn't change anything for single-threaded code. If they do create a difference, that's always a problem. I think that's what you meant to ask, and it's what the quotes are saying.


    There are instructions like rdtsc and rdpmc that read a timestamp or performance counter; those will of course give different results when you put slow instructions (TSC) or extra instructions/uops into a sequence. rdpmc is essentially reading microarchitectural counters into architectural state (register values), and rdtsc is reading time, so there was never any expectation that they'd give the same results with or without serialization.