c pointers caching cuda restrict-qualifier

Pointer to pointer aliasing and the restrict keyword

I'm familiar with the usage of the __restrict keyword for performance optimization in C and specifically CUDA in this case.

void Foo(const float* __restrict X, const float* __restrict Y);

I understand that this Foo function has __restrict keywords which indicate to the compiler that X and Y are guaranteed to point to distinct blocks of memory.

What happens when we have a pointer to a pointer as far as alias restriction?

void Bar1(const float* const * __restrict X, const float* const * __restrict Y);
void Bar2(const float* const __restrict * __restrict X, const float* const __restrict * __restrict Y);

Is Bar1 fully restricted or does each level of indirection need to be restricted as shown in Bar2?

Which syntax correctly indicates that all pointers can take advantage or read-only caching? Do I need to "restrict" both pointers or only the top level variable name?

Solution

Do I need to "restrict" both pointers or only the top level variable name?

Restrict both "levels" of pointers.

Even if this is not necessary for enabling use of the non-coherent/read-only cache - that is still the right choice, because you are being more explicitly in your description of the input parameters. You are making it clearer to the person using your function that they are expected not to have the inner pointer point to overlapping locations.