A Pascal SM has 4 scheduler per SM, when we get something like 3 as the IPC, does it mean 3 instructions where scheduled by the SM in a cycle? Also, does NVPROF consider an average of the IPC of all SM for that Kernel and report it?
if one SM has an IPC of 3 then a GPU with 2 SM should give us an IPC of 6, right?
Also, I am working with a simulator called as GPGPU sim, which reports IPC in higher ranges (80-120) I assume that they are calculating IPC per core and scaling the metric to the whole simulated GPU but I am not sure.
Can some one please verify the IPC metric?
NVPROF ipc metric is calculated as SUM(sm_inst_executed) / SUM(sm_active_cycles)
This results in the average IPC of a single SM. Maxwell/Pascal SMs have a maximum SM IPC of 6. Volta/Turing SMs have a maximum SM IPC of 4.
sm_inst_executed - The number of warp instructions executed counted at the point where the instruction must complete (cannot be rolled back due to speculative execution). Fully predicated off instructions are counted.
sm_active_cycles - The number of cycles the SM had at least 1 active/resident warp.
NVIDIA Perfworks provides the following metrics:
sm[sp]__inst_executed_{avg, sum}per{active, elapsed}_cycle.
The _sum variant is the total IPC (max is SM_COUNT * SM_MAX_IPC) The _avg variant is the average IPC (SUM(sm__inst_executed) / SUM(sm__{active, elapsed}_cycle)) The elapsed_cycles variant includes cycles the SM is not active.