I'm looking for AMD specific performance counters which count cycles consumed by page walks when TLB
misses occur. I know Intel has such metrics available.
But do such exist on AMD? I looked in http://developer.amd.com/wordpress/media/2013/12/56255_OSRR-1.pdf but didn't find anything close to what I need.
I also looked in perf
source code https://elixir.bootlin.com/linux/latest/source/arch/x86/events/amd/core.c#L248 It does not seem to have either.
May be it has different names? Any suggestions?
It seems to me you're looking for events similar to Intel's *.WALK_DURATION
or *.WALK_ACTIVE
on AMD Zen processors. There are no such events with the same exact meaning, but there are similar events.
The closest events are the IBS performance data fields IbsTlbRefillLat
and IbsItlbRefillLat
, which measure the number cycles it takes to fulfill an L1 DTLB or L1 ITLB miss, respectively, in case of miss for the selected instruction fetch or uop. Note that in perf record
, IbsTlbRefillLat
can be captured with the ibs_fetch
PMU and IbsItlbRefillLat
can be captured with the ibs_op
PMU.
The event Core::X86::Pmc::Core::LsTwDcFills
is also useful. It counts the number of L1 data cache fills for page table walks that miss in the L1 for each data source (local L2, L3 on the same die, L3 on another die, DRAM or IO on the same die, DRAM or IO on another die). Walks fulfilled from farther sources are more expensive and would probably have a larger impact on performance. This event doesn't count walks that hit in the L1 data cache, although there are other events that count L2 TLB misses. Also, this event only count for L2 DTLB misses and not ITLB misses.
In current versions of upstream kernel, LsTwDcFills
is not listed by perf list
and so perf
doesn't know the event by name. So you'll have specify the event code using the syntax cpu/event=0x5B, umask=0x0/
. This event represents any page table walk for a data load or store for which there is an allocated MAB (meaning that the walker missed in the L1D). You can filter the count according to the response by specifying an appropriate umask value as defined in the manual. For example, the event cpu/event=0x5B, umask=0x48/
represents a walk where the response came from local or remote main memory.
One good approach for utilizing all of these monitoring facilities as a small part of your overall microarchitectural performance analysis methodology is to first monitor LsTwDcFills
. If it exceeds some threshold compared to the total number of memory accesses (excluding instruction fetches), then capture IbsTlbRefillLat
for sampled uops to locate where in your code these expensive walks are occurring. Similarly, for instruction fetch walks, use the event Core::X86::Pmc::Core::BpL1TlbMissL2Hit
for counting total walks and if the count is too large with respect to total fetches, use IbsItlbRefillLat
to locate where in your code the most expensive walks are occurring.