Search code examples
hlslexecution-timehlsl2glsl

GPU HLSL compute shader warnings int and uint division


I keep having warnings from compute shader compilation in that I'm recommended to use uints instead of ints with dividing.

By default from the data type I assume uints are faster; however various tests online seem to point to the contrary; perhaps this contradiction is on the CPU side only and GPU parallelisation has some unknown advantage? (Or is it just bad advice?)


Solution

  • I know that this is an extremely late answer, but this is a question that has come up for me as well, and I wanted to provide some information for anyone who sees this in the future.

    I recently found this resource - https://arxiv.org/pdf/1905.08778.pdf

    The table at the bottom lists the latency of basic operations on several graphics cards. There is a small but consistent savings to be found by using uints on all measured hardware. However, what the warning doesn't state is that the greater optimization is to be found by replacing division with multiplication if at all possible.

    https://www.slideshare.net/DevCentralAMD/lowlevel-shader-optimization-for-nextgen-and-dx11-by-emil-persson states that type conversion is a full-rate operation like int/float subtraction, addition, and multiplication, whereas division is very slow.

    I've seen it suggested that to improve performance, one should convert to float, divide, then convert back to int, but as shown in the first source, this will at best give you small gains and at worst actually decrease performance.

    You are correct that it varies from performance of operations on the CPU, although I'm not entirely certain why.

    Looking at https://www.agner.org/optimize/instruction_tables.pdf it appears that which operation is faster (MUL vs IMUL) varies from CPU to CPU - in a few at the top of the list IMUL is actually faster, despite a higher instruction count. Other CPUs don't provide a distinction between MUL and IMUL at all.

    TL;DR uint division is faster on the GPU, but on the CPU YMMV