How to `std::shuffle` CSR arrays in OpenACC

To set up the context, I have a running C++ code (of a graph algorithm) which I am trying to accelerate using OpenACC / nvc++ both on GPU and multicore. I've used vector of vector to store the adjacency list. As vectors affect performance I decided to rewrite using CSR arrays -- most common on GPU programs. The goal is to randomize the neighbours of each vertex.

After creating the CSR arrays I used std::shuffle to achieve the same.

  int* offset;
  int* value;
  
  convert_VecOfVec_to_CSR(a, offset, value); 
  shuffleIt(offset, value, n);
  
  // DEFINITION
void shuffleIt(int *&offset, int* &value, int n){
  #pragma acc parallel loop
  for (int i = 0; i < n; ++i) {
    std::shuffle(value+offset[i], value+offset[i+1] ,std::default_random_engine(rand()));
  }
}

When acc=multicoreis used the code compiles and runs fine. But I got the below error for gpu. Does that mean I can NOT use std::shuffle or random functions in my OpenACC's gpu code? I recall math.h functions were working fine. Please advice. Thanks.

$ nvc++ -acc -gpu=managed -Minfo=all shuffle-2D-csr2.cpp -o shuffle-2D-csr2.out && ./shuffle-2D-csr2.out
NVC++-S-1061-Procedures called in a compute region must have acc routine information - std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::linear_... (shuffle-2D-csr2.cpp: 66)
shuffleIt(int *&, int *&, int):
     66, Accelerator restriction: call to 'std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::linear_congruential_engine(unsigned long)' with no acc routine information
std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::operator ()():
      7, include "random"
          49, include "random.h"
              348, Generating implicit acc routine seq
                   Generating acc routine seq
                   Generating NVIDIA GPU code
...
NVC++/x86-64 Linux 22.7-0: compilation completed with severe errors

I have not added any data clause, for now, to get the functionality correct first.

Full Gist code

P.S. I am learning OpenACC.

Solution

In order to call a device function, there needs to be a device version of the routine available. If you're using templates or when the definition of the function is visible to the compiler at compilation, then nvc++ typically can auto-generate the device function for you. Otherwise, you need to decorate the function with the "acc routine" directive so a device version of routine is generated. However, I doubt you'll be able to add a "routine" directive here given it's a system call.

Some of libc++ has been ported to the device (see: https://nvidia.github.io/libcudacxx/) but not the entirely as of yet.

Though in general, you need to be careful with random number generation on the device. RNGs are not thread safe and you can get collisions on the state variables. I wrote a detailed response a few years ago on this which you can read here: Portable random number generation with OpenACC

This article may also be helpful: https://www.openacc.org/blog/pseudo-random-number-generation-lightweight-threads