Search code examples
cmathcudaptx

CUDA __float_as_int in acosf implementation


CUDA C's maths function implementation (cuda/math_function.h) of acosf contains the passage:

if (__float_as_int(a) < 0) {
  t1 = CUDART_PI_F - t1;
}

where a and t1 are floats and CUDART_PI_F is a float previously set to a numerical value close to the mathematical constant Pi. I am trying to understand what the conditional (if-clause) is testing for and what would be the C equivalent of it or the function/macro __float_as_int(a). I searched for the implementation of __float_as_int() but without success. It seems that __float_as_int() is a built-in macro or function to NVIDIA NVCC. Looking at the PTX that NVCC produces out of the above passage:

    .reg .u32 %r<4>;
    .reg .f32 %f<46>;
    .reg .pred %p<4>;
    // ...
    mov.b32         %r1, %f1;
    mov.s32         %r2, 0;
    setp.lt.s32     %p2, %r1, %r2;
    selp.f32        %f44, %f43, %f41, %p2;

it becomes clear that __float_as_int() is not a float to int rounding. (This would have yielded a cvt.s32.f32.) Instead it assigns the float %f1 as a bit-copy (b32) to %r1 (notice: %r1 is of type u32 (unsigned int)!!) and then compares %r1 as if it was a s32 (signed int, confusing!!) with %r2 (who's value is 0).

To me this looks a little odd. But obviously it is correct.

Can someone explain what's going on and especially explain what __float_as_int() is doing in the context of the if-clause testing for being negative (<0)? .. and provide a C equivalent of the if-clause and/or __float_as_int() marco ?


Solution

  • __float_as_int reinterprets float as an int. int is <0 when it has most significant bit on. For float it also means that the sign bit is on, but it does not exactly mean that number is negative (e.g. it can be 'negative zero'). It can be faster to check then checking if float is < 0.0.

    C function could look like:

    int __float_as_int(float in) {
         union fi { int i; float f; } conv;
         conv.f = in;
         return conv.i;
    }
    

    In some other version of this header __cuda___signbitf is used instead.