Search code examples
cudashared-ptrraii

How to implement RAII of CUDA API type cudaEvent_t using shared_ptr


The CUDA API has types that require create() and destroy() calls analogous to memory allocation new and delete. In the spirit of RAII, and rather than having to call cudaEventCreate( &event) and cudaEventDestory( event ), I wrote the following wrapper for cudaEvent_t.

My Question: Is this acceptable code without any obvious errors?

It builds for me and I've yet to discover a problem. But I particularly do not like the reinterpret_cast<> trickery used to get the cudaEvent_t variable through the custom Allocater and Deleter for the shared_ptr.

Some related posts:

CUDA: Wrapping device memory allocation in C++

Is there a better/cleaner/more elegant way to malloc and free in cuda?

class CudaEvent {
private:
    struct Deleter {
        void operator()(cudaEvent_t * ptr) const {
            checkCudaErrors( cudaEventDestroy( reinterpret_cast<cudaEvent_t>(ptr) ));
        }
    };

    shared_ptr<cudaEvent_t> Allocate( ){
        cudaEvent_t event;
        checkCudaErrors( cudaEventCreate( &event ) );
        shared_ptr<cudaEvent_t> p( reinterpret_cast<cudaEvent_t*>(event), Deleter() );
        return p;
    }

    shared_ptr<cudaEvent_t> ps;

public:
    cudaEvent_t event;

    CudaEvent(  )
    : ps( Allocate( ) ),
      event( *(ps.get()) )
    {   }
};

Solution

  • You're conflating two independent mechanisms: A RAII class for CUDA events, and lifetime management using a shared pointer. These should be quite separate.

    Another issue is that it's not clear what your "checkCudaErrors" is supposed to do.

    A last issue is the one talonmies mentioned, which is what were to happen if you get the scope/lifetime wrong. For example - you've reset the device before the last reference to this event has been released. Or - you enqueue this event on a stream, then drop the point to it. So you're not really guaranteed safety by using the shared pointer - you'll have to keep track of things just as you would if you only had the id. In fact, this might make things even more difficult.

    Finally, note that you can use the CUDA runtime API with modern-C++ wrappers, which, specifically, use RAII rather than createXYZ() and destroyXYZ():

    https://github.com/eyalroz/cuda-api-wrappers

    Specifically, you can have a look at:

    Due disclosure: I'm the author of this library.