Search code examples
benchmarkdotnet

Difference between LLCMisses and CacheMisses on Hardware Counters


What is the difference between LLCMisses and CacheMisses?


Solution

  • The value returned for both counters should generally be the same.

    The counters available in BenchmarkDotNet are those provided by the Windows ETW infrastructure. Unfortunately, so far as I am aware Microsoft does not offer any specific information about any of them, but we can reasonably infer quite a bit from the ones we see.

    On the Intel systems I have seen full PMC source listings for, the list ends with 8 entries with sequential IDs The first seven of those eight (UnhaltedCoreCycles, InstructionRetired, UnhaltedReferenceCycles, LLCReference, LLCMisses, BranchInstructionRetired, BranchMispredictsRetired) pretty much exactly match the names and order of the seven Intel Architectural Performance Event counters (see the Performance Monitoring chapter of the Intel Software Developer's Manual for details).

    The last of the 8, LbrInserts, likely refers to the Intel Last Branch Record performance monitoring functionality. So it appears reasonable to presume these sources directly map to those specific x86 counters, and they will not be present on architectures without them.

    Of the other 5 sources listed, TotalIssues returns the same values as InstructionRetired; BranchInstructions matches BranchInstructionRetired, CacheMisses matches LLCMisses, BranchMispredictions matches BranchMispredictsRetired, and TotalCycles matches UnhaltedCoreCycles.

    Presumably, other CPU architectures have their own architecture specific sources defined, with those sources mapped to different architecture specific counters, e.g. BranchMispredictions on ARM might map to the BR_MIS_PRED counter, which does not have the same semantics as Intel's Branch Mispredicts Retired, but still represents the concept of branch misprediction.

    So then the actual answer is, if you are distributing software with a predefined value, you pick LLCMisses if you want the specific meaning of the Intel counter. If you just want the concept of a cache miss, you pick CacheMisses so that it might also work on other architectures with different performance counters. And if you're just running it locally, it doesn't really matter which you pick.