c++memory-management cuda gpu variable-initialization

CUDA error: "dynamic initialization is not supported for device, constant and shared variables"

I'm trying to statically initialize read-only std::map variables in GPU memory as follows:

// EXAMPLE 1:
using namespace std;

// first attempt: __device__ extern const
__device__ extern const map<char, const char*> BYTES_TO_WORDS = {
{0xB0, "zero"}, {0xB1, "one"}, {0xB2, "two"}, {0xB3, "three"}};

// second attempt: __const__ static
enum class Color{RED, GREEN, BLUE};
enum class Device{PC, TABLET, PHONE};

__constant__ static map<Color, Device> COLORS_TO_THINGS = {
{Color::RED,Device::PC},{Color::GREEN,Device::TABLET},{Color::BLUE,Device::PHONE}};

But I'm getting the following error:

dynamic initialization is not supported for __device__, __constant__ and __shared__ variables

I'm confused because I don't get this error when I try something like this:

// EXAMPLE 2:
__device__ extern int PLAIN_ARRAY[] = {1, 2, 3, 4, 5};

I just want to be able to create and initialize a read-only std::map and access it from both CPU and GPU code. I would appreciate if you could tell me how to do it properly.

EDIT: I was pointed out that the standard libraries are not supported in device code. But the error I'm getting seems to suggest that it's rather a memory management issue.

Solution

Initializing a C++ object such as an std::map involves calling the constructor at runtime. The C++11 syntax you are using to initialize your std::maps is a form of list initialization which calls the std::initializer_list overload of std::map's constructor. Your example with PLAIN_ARRAY does not call any constructors as this is a form of aggregate initialization which only involves initializing some ints by value, and initializing an int does not require a constructor call.

In CUDA, it is not possible to use any kind of dynamic initialization with global variables stored on the GPU, such as __device__ and __constant__ variables, which means the initial value of the object must be known at compile-time, and not only produced at runtime after calling a constructor.

Another issue is that even in contexts where you can call constructors in device code, you wouldn't be able to call the constructor of std::map as, being part of the C++ standard library, it has no __device__ constructor, nor does it have any other __device__ member functions, so it can only be used from host code. The CUDA runtime does not define any kind of device functionality for C++ STL classes. Even if you manage to cudaMemcpy() an std::map from host memory to GPU memory, you won't be able to use the object, firstly because all its member functions are __host__ functions, with no __device__ counterparts, and secondly, an std::map will internally contain pointer member variables referring to dynamically allocated host memory, which will not be valid memory addresses on the GPU.

An alternative would be to use plain arrays of structs instead of maps, for example:

__device__
const struct {
    unsigned char byte;
    const char word[10];
} BYTES_TO_WORDS[] = {
    {0xB0, "zero"},
    {0xB1, "one"},
    {0xB2, "two"},
    {0xB3, "three"}
};

However, unlike with std::map, you will have to implement looking up a value by its key manually.

I just want to be able to create and initialize a read-only std::map and access it from both CPU and GPU code.

Unfortunately, this is not trivial, since you can't define a variable as both __device__ and __host__. To access a __device__ variable from host code, you would have to use cudaMemcpyFromSymbol(), which is quite awkward compared to just accessing a variable like normal. Therefore you may end up having to define your constants in host memory and then copy your constants from host memory to device memory:

const byte_word BYTES_TO_WORDS[] = {
    {0xB0, "zero"},
    // ...
};

// uninitialized array
__device__
byte_word DEV_BYTES_TO_WORDS[sizeof BYTES_TO_WORDS / sizeof(byte_word)];

// at startup, use `cudaMemCpyToSymbol()` to populate `DEV_BYTES_TO_WORDS`
// from `BYTES_TO_WORDS`.

An alternative would be to use a preprocessor define to effectively copy and paste the same initializer across both arrays, rather than copying the data over at runtime. In any case, two separate arrays are required.

CUDA error: "dynamic initialization is not supported for __device__, __constant__ and __shared__ variables"

CUDA error: "dynamic initialization is not supported for device, constant and shared variables"