To set up the context, I have a running C++ code (of a graph algorithm) which I am trying to accelerate using OpenACC / nvc++
both on GPU and multicore. I've used vector
of vector
to store the adjacency list. As vectors affect performance I decided to rewrite using CSR arrays -- most common on GPU programs. The goal is to randomize the neighbours of each vertex.
After creating the CSR arrays I used std::shuffle
to achieve the same.
int* offset;
int* value;
convert_VecOfVec_to_CSR(a, offset, value);
shuffleIt(offset, value, n);
// DEFINITION
void shuffleIt(int *&offset, int* &value, int n){
#pragma acc parallel loop
for (int i = 0; i < n; ++i) {
std::shuffle(value+offset[i], value+offset[i+1] ,std::default_random_engine(rand()));
}
}
When acc=multicore
is used the code compiles and runs fine. But I got the below error for gpu
. Does that mean I can NOT use std::shuffle
or random
functions in my OpenACC's gpu
code? I recall math.h
functions were working fine. Please advice. Thanks.
$ nvc++ -acc -gpu=managed -Minfo=all shuffle-2D-csr2.cpp -o shuffle-2D-csr2.out && ./shuffle-2D-csr2.out
NVC++-S-1061-Procedures called in a compute region must have acc routine information - std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::linear_... (shuffle-2D-csr2.cpp: 66)
shuffleIt(int *&, int *&, int):
66, Accelerator restriction: call to 'std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::linear_congruential_engine(unsigned long)' with no acc routine information
std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::operator ()():
7, include "random"
49, include "random.h"
348, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
...
NVC++/x86-64 Linux 22.7-0: compilation completed with severe errors
I have not added any data
clause, for now, to get the functionality correct first.
P.S. I am learning OpenACC.
In order to call a device function, there needs to be a device version of the routine available. If you're using templates or when the definition of the function is visible to the compiler at compilation, then nvc++ typically can auto-generate the device function for you. Otherwise, you need to decorate the function with the "acc routine" directive so a device version of routine is generated. However, I doubt you'll be able to add a "routine" directive here given it's a system call.
Some of libc++ has been ported to the device (see: https://nvidia.github.io/libcudacxx/) but not the entirely as of yet.
Though in general, you need to be careful with random number generation on the device. RNGs are not thread safe and you can get collisions on the state variables. I wrote a detailed response a few years ago on this which you can read here: Portable random number generation with OpenACC
This article may also be helpful: https://www.openacc.org/blog/pseudo-random-number-generation-lightweight-threads