Say I have the following function:
__global__ void testFunction(double *a, double *b) {
unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;
a[index] = fmax(b[index], 0.0f);
}
then building gives the error:
error: calling a constexpr __host__ function("fmax") from a __global__ function("testFunction") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
This apparently means that I cannot use the function fmax()
, which is defined in the std library cmath
. However, it is also defined as a CUDA device function, which is the one I wanted to use in the first place.
So the question is: how to tell CMake (or any compiler if that's an option) to use the CUDA __device__
definition of fmax()
instead of the cmath
version?
Note: I am not using using namespace std;
and/or #include <cmath>
in this code.
CMakeLists.txt:
cmake_minimum_required(VERSION 3.17)
project(NAME CUDA)
set(CMAKE_CUDA_STANDARD 14)
add_executable(NAME main.cu /*some other files*/)
set_target_properties(
NAME
PROPERTIES
CUDA_SEPARABLE_COMPILATION ON)
The problem has nothing to do with CMake or linking.
CUDA uses template overloading to provide native math library functions in device code. As per the documentation, exactly two versions of fmax
are provided (here and here). They are
float fmax(float, float)
double fmax(double double)
Your code is requesting
double fmax(double, float)
because 0.0f
is a single precision constant. There is no native overload for that, so it falls through the CUDA toolchain front end and the compiler winds up concluding you want a host function, thus the error.
The correct code would be
a[index] = fmax(b[index], 0.);
That will use the correct double precision version.