reinterpret_cast of pointer-to-pointer - is it Undefined Behavior?

We need to cast int** to void**, for which reinterpret_cast can be used. However, is it technically allowed by the C++ Standard, or are we in Undefined Behavior territory?

Use case

When using Nvidia libraries we need to use pointer-to-pointer:

cudaMalloc(void** ptr, size_t size);

So a typical solution is:

int* p;
cudaMalloc(reinterpret_cast<void**>(&p), 123);


  • The cast itself is permitted, but is not practically useful under the guarantees made by the standard, because any non-trivial use of the result will eventually cause undefined behavior.

    Specifically, if alignof(void**) > alignof(int**), then it is possible that the original pointer is not suitably aligned for void** and then the result of the cast would be unspecified and practically any use of the result would consequently cause undefined behavior. It is then impossible to cast back to the original value as well.

    Otherwise, the cast, passing the pointer around and cast back to int** are all permitted. However accessing through the void** (e.g. to write a pointer value into the provided pointer-to-pointer) will be an aliasing violation, because void* can't alias int*. So if cudaMalloc accesses the pointer without first casting back to int**, then the behavior is undefined.

    That's all the C++ standard has to say on this. Given that you seem to be asking about CUDA specifically, its specification can of course make additional guarantees. And that's probably going to be very significant in practice.

    However, if you consider this as some generic malloc-like implementation, then it is impossible to write this function so that this usage pattern would have defined behavior, because the function wouldn't be able to know to which pointer type it must cast back to first. And that's not only a theoretical concern. Many compilers do make use of this type of UB from aliasing violations for optimization.

    The correct pattern that can be implemented without aliasing violation and UB would instead be something like

    void* p_temp;
    cudaMalloc(&p_temp, 123);
    auto p = static_cast<int*>(p_temp);