Search code examples
cudaconstantshost

Host float constant usage in a kernel in CUDA


I am using CUDA 5.0. I noticed that the compiler will allow me to use host-declared int constants within kernels. However, it refuses to compile any kernels that use host-declared float constants. Does anyone know the reason for this seeming discrepancy?

For example, the following code runs just fine as is, but it will not compile if the final line in the kernel is uncommented.

#include <cstdio>
#include <cuda_runtime.h>

static int   __constant__ DEV_INT_CONSTANT   = 1;
static float __constant__ DEV_FLOAT_CONSTANT = 2.0f;

static int   const        HST_INT_CONSTANT   = 3;
static float const        HST_FLOAT_CONSTANT = 4.0f;

__global__ void uselessKernel(float * val)
{
    *val = 0.0f;

    // Use device int and float constants
    *val += DEV_INT_CONSTANT;
    *val += DEV_FLOAT_CONSTANT;

    // Use host int and float constants
    *val += HST_INT_CONSTANT;
    //*val += HST_FLOAT_CONSTANT; // won't compile if uncommented
}

int main(void)
{
    float * d_val;
    cudaMalloc((void **)&d_val, sizeof(float));

    uselessKernel<<<1, 1>>>(d_val);

    cudaFree(d_val);
}

Solution

  • Adding a const number in the device code is OK, but adding a number stored on the host memory in the device code is NOT.

    Every reference of the static const int in your code can be replaced with the value 3 by the compiler/optimizer when the addr of that variable is never referenced. In this case, it is like #define HST_INT_CONSTANT 3, and no host memory is allocated for this variable.

    But for float var, the host memory is always allocated even it is of static const float. Since the kernel can not access the host memory directly, your code with static const float won't be compiled.

    For C/C++, int can be optimized more aggressively than float.

    You code runs when the comment is ON can be seen as a bug of CUDA C I think. The static const int is a host side thing, and should not be accessible to the device directly.