Search code examples
cudanvidiaprofilerinstructions

Miscellaneous and Inter-Thread Communication Instructions in CUDA


I've been playing around with the NVIDIA profiler (nvprof) and there are two particular metrics which I do not understand:

inst_inter_thread_communication
    Number of inter-thread communication instructions executed by non-predicated threads
inst_misc
    Number of miscellaneous instructions executed by non-predicated threads

I'm just wondering what instructions would be inter-thread communication instructions and which instructions would fall under miscellaneous.

Reference: http://docs.nvidia.com/cuda/profiler-users-guide/#metrics-reference


Solution

  • The SASS instructions that fall into the two categories are as follows:

    inst_inter_thread_communication

    • SHFL
    • VOTE

    inst_misc

    • NOP
    • S2R, B2R, R2B, P2R
    • LEPC
    • CSET[P], PSET[P]
    • MOV
    • SEL
    • PRMT
    • Maxwell Only (BAR, DEPBAR)
    • There are several infrequent undocumented instructions that increment this category.

    The document CUDA Binary Utilities section Instruction Set Reference contains a brief description of the SASS instructions. There is close to a 1:1 relationship between SASS and PTX so you can also review the PTX ISA manual.