For some CUDA application profilings, I see that the value of local hit rate (local_hit_rate metric) is 0%.
I want to distinguish the following concepts with that value.
The application has no access to the local cache.
All accesses to local cache were misses.
How can I find the answer? Since the value of inst_compute_ld_st
, ldst_issued
and ldst_executed
are non-zero, is it fine to discard the first question? Or there is something else?
The device is M2000 which is CC5.3 CC5.2
nvprof supports both events (raw counters) and metrics. These can be queried using the following commands: nvprof --query-events nvprof --query-metrics
CC5./6. Local Memory Metircs
local__request is the number of instructions executed to local memory via generic address space or local address space. On CC5./6.* I do not recall if this includes fully predicated of instructions.
local_*_transactions is the number of cache accesses that occurred due to the size (32-bit, 64-bit, ...) of the request and the address divergence of the request. If this is non-zero then local memory was accessed.
l2_local_*_bytes is the number of bytes of data loaded/stored to the L2 cache.