Search code examples
intelperfamd-processortlb

What is the meaning of Perf events: dTLB-loads and dTLB-stores?


I'm trying to understand the meaning of the perf events: dTLB-loads and dTLB-stores?


Solution

  • When virtual memory is enabled, the virtual address of every single memory access needs to be looked up in the TLB to obtain the corresponding physical address and determine access permissions and privileges (or raise an exception in case of an invalid mapping). The dTLB-loads and dTLB-stores events represent a TLB lookup for a data memory load or store access, respectively. The is the perf definition of these events. but the exact meaning depends on the microarchitecture.

    On Westmere, Skylake, Kaby Lake, Coffee Lake, Cannon Lake (and probably Ice Lake), dTLB-loads and dTLB-stores are mapped to MEM_INST_RETIRED.ALL_LOADS and MEM_INST_RETIRED.ALL_STORES, respectively. On Sandy Bridge, Ivy Bridge, Haswell, Broadwell, Goldmont, Goldmont Plus, they are mapped to MEM_UOP_RETIRED.ALL_LOADS and MEM_UOP_RETIRED.ALL_STORES, respectively. On Core2, Nehalem, Bonnell, Saltwell, they are mapped to L1D_CACHE_LD.MESI and L1D_CACHE_ST.MESI, respectively. (Note that on Bonnell and Saltwell, the official names of the events are L1D_CACHE.LD and L1D_CACHE.ST and the event codes used by perf are only documented in the Intel manual Volume 3 and not in other Intel sources on performance events.) The dTLB-loads and dTLB-stores events are not supported on Silvermont and Airmont.

    On all current AMD processors, dTLB-loads is mapped to LsDcAccesses and dTLB-stores is not supported. However, LsDcAccesses counts TLB lookups for both loads and stores. On processors from other vendors, dTLB-loads and dTLB-stores are not supported.

    See Hardware cache events and perf for how to map perf core events to native events.

    The dTLB-loads and dTLB-stores event counts for the same program on different microarchitectures can be different not only because of differences in the microarchitectures but also because the meaning of the events is itself different. Therefore, even if the microarchitectural behavior of the program turned out to be the same on the microarchitectures, the event counts can still be different. A brief description of the native events on all Intel microarchitectures can be found here and a more detailed description on some of the microarchitectures can be found here.

    Related: how to interpret perf iTLB-loads,iTLB-load-misses.