How to convince CMake to use the CUDA fmax function instead of the std cmath function?

Say I have the following function:

__global__ void testFunction(double *a, double *b) {
   unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;
   a[index] = fmax(b[index], 0.0f);
}

then building gives the error:

error: calling a constexpr __host__ function("fmax") from a __global__ function("testFunction") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

This apparently means that I cannot use the function fmax(), which is defined in the std library cmath. However, it is also defined as a CUDA device function, which is the one I wanted to use in the first place.

So the question is: how to tell CMake (or any compiler if that's an option) to use the CUDA __device__ definition of fmax() instead of the cmath version?

Note: I am not using using namespace std; and/or #include <cmath> in this code.

CMakeLists.txt:

cmake_minimum_required(VERSION 3.17)
project(NAME CUDA)

set(CMAKE_CUDA_STANDARD 14)

add_executable(NAME main.cu /*some other files*/)

set_target_properties(
        NAME
        PROPERTIES
        CUDA_SEPARABLE_COMPILATION ON)

Solution

The problem has nothing to do with CMake or linking.

CUDA uses template overloading to provide native math library functions in device code. As per the documentation, exactly two versions of fmax are provided (here and here). They are

float fmax(float, float)
double fmax(double double)

Your code is requesting

double fmax(double, float)

because 0.0f is a single precision constant. There is no native overload for that, so it falls through the CUDA toolchain front end and the compiler winds up concluding you want a host function, thus the error.

The correct code would be

a[index] = fmax(b[index], 0.);

That will use the correct double precision version.