strange behavior of powf() function

In an unexpected manner,powf produces a strange output for odd base numbers when its type is int. For example powf(-4,2)returns 16 but powf(-5,2) returns24 !!!

After tracing the root of a wrong output in a long calculation, I figured out that powf function shows strange behavior for odd numbers when output type is an integer.

__global__ void intFoo( int* a) 
{
    *a = powf(*a, 2);
}
__global__ void doubleFoo( double* a) 
{
    *a = powf(*a, 2);
}

I can call this kernel (for instance) in Matlab :

!nvcc -ptx test.cu 
k1 = parallel.gpu.CUDAKernel('test.ptx', 'test.cu', 'intFoo');
k2 = parallel.gpu.CUDAKernel('test.ptx', 'test.cu', 'doubleFoo');
out1 = feval(k1, -4)
out2 = feval(k1, -5)
out3 = feval(k2, -4)
out4 = feval(k2, -5)

result:

out1 = 16
out2 = 24 //This hasn't to be 25 !!??
out3 = 16
out4 = 25.000

EDIT:

After investigating in Matlab by @Robert Crovella suggestion, I found out that Command Window in Matlab shows out4=25.000 as opposed to Variables Window which shows the content of out4 = 24.9999981.

Everyone should be very cautious as there is a small error associated with output of powf function (24.9999981 instead of 25) that may get propagated and become a problem with large calculations

Solution

I believe this is due to unwise usage of datatypes with feval.

It appears to me that feval converts the return type to be the same type as the parameter type. This makes sense, since the return type is extracted from the pointer to the passed argument for that parameter.

note that powf takes float parameters and returns a float, and pow takes double parameters and returns a double. int quantities don't have a separate function (prototype) in the CUDA math API, so if you use them they will be cast to and from floating point types.

Here's what I see in pure CUDA C++:

$ cat t32.cu
#include <math.h>
#include <stdio.h>

__global__ void Foo( int a, double b)
{
            float res = powf((float)a, 2);
            printf("powf_int: %d, %d, %f\n", a, (int)res, res);
            res = powf((float)b, 2);
            printf("powf_double: %f, %f, %f\n", b, (double)res, res);
            double dres = pow((double)a, 2);
            printf("pow_int: %d, %d, %f\n", a, (int)dres, dres);
            dres = pow((double)b, 2);
            printf("pow_double: %f, %f, %f\n", b, (double)dres, dres);
}

int main(){

        Foo<<<1,1>>>(-5, -5);
        cudaDeviceSynchronize();
}
$ nvcc -o t32 t32.cu
$ cuda-memcheck ./t32
========= CUDA-MEMCHECK
powf_int: -5, 24, 24.999998
powf_double: -5.000000, 24.999998, 24.999998
pow_int: -5, 25, 25.000000
pow_double: -5.000000, 25.000000, 25.000000
========= ERROR SUMMARY: 0 errors
$

Note that:

CUDA powf returns 24.999998 for (-5,2)
if we convert this to int it gets truncated to 24
if we convert this to double and then round to 3 decimal places, the correctly rounded result would be 25.000 just as displayed in your matlab output

Suggestions:

don't do this
don't use integer types with floating point functions (especially casting the result)
if you want to square something, just multiply it with itself. It will definitely be faster than using powf(x, 2) and will possibly be more accurate also.

If you want to know "why does CUDA powf(-5, 2) return 24.999998?", please ask that in a separate question. The accuracy is defined in the programming manual and I'm reasonably sure this falls within the published error bounds. Here is another example of pow "weirdness".