Search code examples
optimizationcudainstruction-setptx

Is it bad that NVCC generates PTX code that is very generous with registers?


I recently read through the generated PTX code of a CUDA kernel. I realized that many registers are used to just store an intermediate value and are then never used again, and that NVCC generally seems to not care much about register re-use and instead opts to just use a new register at pretty much any point new data is created.

This raises the question, is it worth to manually go over the PTX code and try to minimize the register use, or is that something the PTX VM handles at runtime anyways?


Solution

  • This raises the question, is it worth to manually go over the PTX code and try to minimize the register use

    No. Nvcc generates static single assignment code deliberately.

    or is that something the PTX VM handles at runtime anyways?

    There is no such thing as a “PTX VM”. PTX is always compiled into shader assembler that runs on the hardware. Register allocation and usage optimisation is done statically by the assembler from PTX code, which can either be part of an nvcc invocation or by the GPU driver itself at runtime.