Search code examples
openclgpureduction

OpenCL reduction result wrong with large floats


I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct.

I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue?


Solution

  • There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating point.

    The correct sum is: 65537 * 32768 = 2147516416, and it takes 31 bits to represent it in binary (10000000000000001000000000000000). 32-bit floats can only hold integers accurately up to 2^24.

    "Any integer with absolute value less than [2^24] can be exactly represented in the single precision format" "Floating Point" article, wikipedia

    This is why you are getting the correct sum when it is less than or equal to 2^24. If you are doing a complete sum using single-precision, you will eventually lose accuracy no matter which device you are executing the kernel on. There are a few things you can do to get the correct answer:

    • use double instead of float if your platform supports it
    • use int or unsigned int
    • sum a smaller set of numbers eg: 0+1+2+...+4095+4096 = (2^23 + 2^11)

    Read more about single precision here.