Search code examples
cudagpunvprof

Understanding the IPC metric from Nvprof and GPGPUsim


A Pascal SM has 4 scheduler per SM, when we get something like 3 as the IPC, does it mean 3 instructions where scheduled by the SM in a cycle? Also, does NVPROF consider an average of the IPC of all SM for that Kernel and report it?

if one SM has an IPC of 3 then a GPU with 2 SM should give us an IPC of 6, right?

Also, I am working with a simulator called as GPGPU sim, which reports IPC in higher ranges (80-120) I assume that they are calculating IPC per core and scaling the metric to the whole simulated GPU but I am not sure.

Can some one please verify the IPC metric?


Solution

  • NVPROF ipc metric is calculated as SUM(sm_inst_executed) / SUM(sm_active_cycles)

    This results in the average IPC of a single SM. Maxwell/Pascal SMs have a maximum SM IPC of 6. Volta/Turing SMs have a maximum SM IPC of 4.

    sm_inst_executed - The number of warp instructions executed counted at the point where the instruction must complete (cannot be rolled back due to speculative execution). Fully predicated off instructions are counted.

    sm_active_cycles - The number of cycles the SM had at least 1 active/resident warp.

    NVIDIA Perfworks provides the following metrics:

    sm[sp]__inst_executed_{avg, sum}per{active, elapsed}_cycle.

    The _sum variant is the total IPC (max is SM_COUNT * SM_MAX_IPC) The _avg variant is the average IPC (SUM(sm__inst_executed) / SUM(sm__{active, elapsed}_cycle)) The elapsed_cycles variant includes cycles the SM is not active.