Search code examples
multithreadinggpuopenclamd-gpu

Balancing blocks, threads and workgroups?


I have an application (did not create myself) that requires three parameters

  • Blocks
  • Threads
  • Points (number of calcs per thread I'm assuming)

It uses OpenCL and I have an RX 580. My current efficiency is low.

The GPU has 2304 modules in 36 compute units

Now I have played around with different values but I have no idea what would be the most optimal starting point because I don't know how blocks and threads relate to the compute units. Any help would be greatly appreciated in understanding how to decide #of blocks, #of threads per block and #of calculations per thread.

Thank you so much


Solution

  • I'm going to make the same assumptions you have:

    Blocks: Number of workgroups
    Thread: Number of threads
    Points: Some metric of work per thread
    

    Its more important to set the correct workgroup size rather than the number of workgroups. You will want the group size to be a minimum of the SIMD width which is usually 32 on most GPUs. So blocks should be set to Threads / 32.

    For "Points". This will depend on how much work is done per "calc". There is overhead with kicking off a workgroup so you want to make sure each thread has enough work to do. From experience ~16 instructions is usually enough. But if you can't see the kernel code then you will just have to experiment.

    In summary:

    1. Set "Points" so that you have at least 2304 threads for the work you need
    2. Set Blocks to threads / 32

    All of this is assuming you have at least 2304 work items otherwise you are not fully utilising your hardware.