Search code examples
cudagpuhpcmicro-optimization

Are there performance/storage differences between uint2 and uint64_t in cuda10+?


I'm trying to optimize a piece of code for A100 GPUs (ampere gen), right now we use uint64_t but I am seeing uint2 datatypes being used instead in some cuda code. Does the uint2 offer advantages for register usage? I know there are a limited number of 64-bit registers, does uint2 split the x,y ints across 32-bit registers for better occupancy? I couldn't find any specific information about register storage with these datatypes so any links to documentation for it would be appreciated.


Solution

  • Does the uint2 offer advantages for register usage?

    No.

    I know there are a limited number of 64-bit registers

    Indeed. Extremely limited, i.e. zero. There are no 64 bit registers in any CUDA compatible GPU I am aware of. When the compiler encounters a 64-bit type, it composites it from two adjacent 32-bit registers.

    does uint2 split the x,y ints across 32-bit registers for better occupancy?

    No. All the CUDA built-in vector types exist for memory bandwidth optimization (there are vector load/store instructions in PTX) and for compatibility with the texture/surface hardware which can do filtering on some of those types, which can be better for performance.