I just found out about the deepcopy flag. Until this moment I've always used -ta=tesla:managed
to handle deep copy and I would like to explore the alternative.
I read this article: https://www.pgroup.com/blogs/posts/deep-copy-beta.htm which is well written but I think it does not cover my case. I have a structure of this type:
typedef struct Data_{
double ****Vc;
double ****Uc;
} Data
The shape of these to array is not defined by an element of the struct itself but by the elements of another structure and that are themselves defined only during the execution of the program.
How can I use the #pragma acc shape(Vc, Uc)
in this case?
Without this pragma and copying the structure as follows:
int main(){
Data data;
Initialize(&data);
}
int Initialize(Data *data){
data->Uc = ARRAY_4D(ntot[KDIR], ntot[JDIR], ntot[IDIR], NVAR, double);
data->Vc = ARRAY_4D(NVAR, ntot[KDIR], ntot[JDIR], ntot[IDIR], double);
#pragma acc enter data copyin(data)
PrimToCons3D(data->Vc, data->Uc, grid, NULL);
}
void PrimToCons3D(double ****V, double ****U, Grid *grid, RBox *box){
#pragma acc parallel loop collapse(3) present(V[:NVAR][:nx3_tot][:nx2_tot][:nx1_tot])
for (k = kbeg; k <= kend; k++){
for (j = jbeg; j <= jend; j++){
for (i = ibeg; i <= iend; i++){
double v[NVAR];
#pragma acc loop
for (nv = 0; nv < NVAR; nv++) v[nv] = V[nv][k][j][i];
}
I get FATAL ERROR: data in PRESENT clause was not found on device 1: name=V host:0x1fd2b80 file:/home/Prova/Src/mappers3D.c PrimToCons3D line:140
Btw, this same code works fine with -ta=tesla:managed
.
Since you don't provide a full reproducing example, I wasn't able to test this, but it would look something like:
typedef struct Data_{
int i,j,k,l;
double ****Vc;
double ****Uc;
#pragma acc shape(Vc[0:k][0:j][0:i][0:l])
#pragma acc shape(Uc[0:k][0:j][0:i][0:l])
} Data;
int Initialize(Data *data){
data->Vc.i = ntot[IDIR];
data->Vc.j = ntot[JDIR];
data->Vc.k = ntot[KDIR];
data->Vc.l = NVAR;
data->Uc.i = ntot[IDIR];
data->Uc.j = ntot[JDIR];
data->Uc.k = ntot[KDIR];
data->Uc.l = NVAR;
data->Uc = ARRAY_4D(ntot[KDIR], ntot[JDIR], ntot[IDIR], NVAR, double);
data->Vc = ARRAY_4D(NVAR, ntot[KDIR], ntot[JDIR], ntot[IDIR], double);
#pragma acc enter data copyin(data)
PrimToCons3D(data->Vc, data->Uc, grid, NULL);
}
void PrimToCons3D(double ****V, double ****U, Grid *grid, RBox *box){
int kbeg, jbeg, ibeg, kend, jend, iend;
#pragma acc parallel loop collapse(3) present(V, U)
for (int k = kbeg; k <= kend; k++){
for (int j = jbeg; j <= jend; j++){
for (int i = ibeg; i <= iend; i++){
Though keep in mind that the "shape" and "policy" directives we not adopted by the OpenACC standard and we (the NVHPC compiler team) only did a Beta version, which we have not maintained.
Probably better to do a manual deep copy, which will be standard compliant, which I can help with if you can provide a reproducer which includes how you're doing the array allocation, i.e. "ARRAY_4D".