After using NSight to profile my CUDA app, I see this under "Issue Efficiency":
After clicking the helpful Help link in the panel pictured above, I found this description in the docs:
Pipeline Busy — The compute resources required by the instruction are not yet available.
Any suggestions on figuring out which compute resources are not yet available, and why?
You can run pipe Utilization experiments to see what's busy. From the User Guide:
Each Streaming Multiprocessor (SM) of a CUDA device features numerous hardware units that are specialized in performing specific task. At the chip level those units provide execution pipelines to which the warp schedulers dispatch instructions to. For example, texture units provide the ability to execute texture fetches and perform texture filtering. Load/Store units fetch and save data to memory. Understanding the utilization of those pipelines and knowing how close they are to the peak performance of the target device are key information for analyzing the efficiency of executing a kernel; and also allows to identify performance bottlenecks caused by oversubscribing to a certain type of pipeline.