Search code examples
cudarestrict-qualifier

CUDA: How to apply __restrict__ on array of pointers to arrays?


This kernel using two __restrict__ int arrays compiles fine:

__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
    for ( /* Iterate over array */ )
        arr1[i] = arr0[i];  // Copy one to other
}

However, the same two int arrays composed into a pointer array fails compilation:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( /* Iterate over array */ )
        arr[1][i] = arr[0][i];  // Copy one to other
}

The error given by the compiler is:

error: invalid use of `restrict'

I have certain structures that are composed as an array of pointers to arrays. (For example, a struct passed to the kernel that has int* arr[16].) How do I pass them to kernels and be able to apply __restrict__ on them?


Solution

  • Filling in the comment in your code with some arbitrary iteration, we get the following program:

    __global__ void kerFoo( int* __restrict__ arr[2], int num )
    {
        for ( int i = 0; i < 1024; i ++)
            arr[1][i] = arr[0][i];  // Copy one to other
    }
    

    and this compiles fine with CUDA 10.1 (Godbolt.org).