What is the difference between LLCMisses and CacheMisses?
The value returned for both counters should generally be the same.
The counters available in BenchmarkDotNet are those provided by the Windows ETW infrastructure. Unfortunately, so far as I am aware Microsoft does not offer any specific information about any of them, but we can reasonably infer quite a bit from the ones we see.
On the Intel systems I have seen full PMC source listings for, the list ends with 8 entries with sequential IDs The first seven of those eight (UnhaltedCoreCycles
, InstructionRetired
, UnhaltedReferenceCycles
, LLCReference
, LLCMisses
, BranchInstructionRetired
, BranchMispredictsRetired
) pretty much exactly match the names and order of the seven Intel Architectural Performance Event counters (see the Performance Monitoring chapter of the Intel Software Developer's Manual for details).
The last of the 8, LbrInserts
, likely refers to the Intel Last Branch Record performance monitoring functionality. So it appears reasonable to presume these sources directly map to those specific x86 counters, and they will not be present on architectures without them.
Of the other 5 sources listed, TotalIssues
returns the same values as InstructionRetired
; BranchInstructions
matches BranchInstructionRetired
, CacheMisses
matches LLCMisses
, BranchMispredictions
matches BranchMispredictsRetired
, and TotalCycles
matches UnhaltedCoreCycles
.
Presumably, other CPU architectures have their own architecture specific sources defined, with those sources mapped to different architecture specific counters, e.g. BranchMispredictions
on ARM might map to the BR_MIS_PRED
counter, which does not have the same semantics as Intel's Branch Mispredicts Retired, but still represents the concept of branch misprediction.
So then the actual answer is, if you are distributing software with a predefined value, you pick LLCMisses
if you want the specific meaning of the Intel counter. If you just want the concept of a cache miss, you pick CacheMisses
so that it might also work on other architectures with different performance counters. And if you're just running it locally, it doesn't really matter which you pick.