Search code examples
c++vectorcudadevicethrust

Does exists some thrust::device_vector equivalent library, to use within CUDA kernel?


The automatic memory management of thrust::device_vector is really useful, the only drawback is that it's not possible to use it from within a kernel code.

I've looked on the Internet and just found vector libraries such as thrust, that deals with device memory from host code. Does any vector library for kernels exists? If not, is it a bad idea to have such a library?


Solution

  • It is possible to write such a library, but it would be very inefficient.

    Indeed thrust::device_vector only differs from thrust::host_vector or std::vector in that it allocates memory on the device instead of the host. The resizing algorithm is the same, and runs on the host.

    The resize logic is quite simple but involves allocating/freeing memory and copying the data. In a multi-threaded setting, you have to lock the whole vector each time a thread resizes it - which can be quite long because of the copy.

    In the case of a kernel which appends elements to a vector, the synchronization mechanism would actually serialize the work since only one thread at a time is allowed to resize. Thus your code would run at the speed of a single device processor, minus the (quite big) synchronization overhead. This would probably be quite a lot slower than a CPU implementation.