In CUDA function type qualifiers __device__
and __host__
can be used together in which case the function is compiled for both the host and the device. This allows to eliminate copy-paste. However, there is no such thing as __host__ __device__
variable. I'm looking for an elegant way to do something like this:
__host__ __device__ const double common = 1.0;
__host__ __device__ void foo() {
... access common
}
__host__ __device__ void bar() {
... access common
}
I found out that the following code complies and runs without errors. (all results were obtained on Ubuntu 14.04 with CUDA 7.5 and gcc 4.8.4 as a host compiler)
#include <iostream>
__device__ const double off = 1.0;
__host__ __device__ double sum(int a, int b) {
return a + b + off;
}
int main() {
double res = sum(1, 2);
std::cout << res << std::endl;
cudaDeviceReset();
return 0;
}
$ nvcc main.cu -o main && ./main
4
In fact, nvcc --cuda main.cu
translates cu-file into this:
...
static const double off = (1.0);
# 5 "main.cu"
double sum(int a, int b) {
# 6 "main.cu"
return (a + b) + off;
# 7 "main.cu"
}
# 9 "main.cu"
int main() {
# 10 "main.cu"
double res = sum(1, 2);
# 11 "main.cu"
(((std::cout << res)) << (std::endl));
# 12 "main.cu"
cudaDeviceReset();
# 13 "main.cu"
return 0;
# 14 "main.cu"
}
...
But, no surprise, if the variable off
is declared without const
qualifier (__device__ double off = 1.0
) I get the following output:
$ nvcc main.cu -o main && ./main
main.cu(7): warning: a __device__ variable "off" cannot be directly read in a host function
3
So, returning back to the original question, can I rely on this behavior with global __device__ const
variable? If not, what are the other options?
UPD By the way, the above behavior doesn't reproduce on Windows.
For ordinary floating point or integral types it should be sufficient simply to mark the variable as const
at global scope:
const double common = 1.0;
It should then be usable in any subsequent function, whether host, __host__
, __device__
, or __global__
.
This is supported in the documentation here, subject to various restrictions:
Let 'V' denote a namespace scope variable or a class static member variable that has const qualified type and does not have execution space annotations (e.g.,
__device__
,__constant__
,__shared__
). V is considered to be a host code variable.The value of V may be directly used in device code, if V has been initialized with a constant expression before the point of use, and it has one of the following types:
- builtin floating point type except when the Microsoft compiler is used as the host compiler,
- builtin integral type.
Device source code cannot contain a reference to V or take the address of V.
In other cases, some possible options are:
Use a compiler macro defined constant:
#define COMMON 1.0
Use templating, if the range of choices on the variable is discrete and limited.
For other options/cases, it may be necessary to manage explicit host and device copies of the variable, e.g. using __constant__
memory on the device, and a corresponding copy on the host. Host and device paths within the __host__
__device__
function that accesses the variable could then differentiate behavior based on a nvcc compiler macro (e.g. #ifdef __CUDA_ARCH__ ...