Search code examples
memory-managementcudathrustreduction

memory management of results of CUDA thrust::minmax_element with pair of device_ptr return type


I have a point cloud in device memory in dPointsWS with a memory layout where first all x-, then all y- and lastly all z-coordinates are stored. I use thrust to compute a tight axis aligned bounding box (AABB) of this point cloud. Here is my code:

// use CUDA thrust library for AABB computation
thrust::pair<thrust::device_ptr<Real>, thrust::device_ptr<Real>> thrustAABB[3];

// do parrallel min_max reduction on GPU for each coordinate axis
thrust::device_ptr<Real> dPointsWS(mDPointsWS);
for (uint32 i = 0, offset = 0; i < 3; ++i, offset += mPointCount)
    thrustAABB[i] = thrust::minmax_element(dPointsWS + offset,
                                           dPointsWS + offset + mPointCount);
cudaDeviceSynchronize();

// get results from the GPU
for (uint32 i = 0; i < 3; ++i)
{
    mAABBWS[2 * i + 0] = *thrustAABB[i].first;
    mAABBWS[2 * i + 1] = *thrustAABB[i].second;
}

What I am wondering about is where the result of thrust::minmax_element is stored before the last code block. I have clearly downloaded the results to host memory at the end, but I would like to avoid this. I've found the following article: thrust reduction result on device memory. However, my case is different since I use the return type thrust::pair<thrust::device_ptr<Real>, thrust::device_ptr<Real>>.

As the reduction function returns a pair of device_ptrobjects, the minimum and maximum results should be stored on the GPU or do I misunderstand this? But if the results are stored on the GPU, how can I control their lifetime. For example, I would like to directly use the results for AABB drawing with OpenGL without downloading them to host memory.


Solution

  • The minimum element and the maximum element found by thrust::minmax_element are resident in your mDPointsWS array (or whatever array that is pointing to; you haven't shown a full example). The thrust operation does not move any data or store any numerical min/max results anywhere. It simply returns two (device_ptr) pointers (in this case), both of which effectively point to positions within your mDPointsWS array, or whatever underlying array allocation is referenced by mDPointsWS. One points to the position of the max element in that array. The other points to the position of the min element in that array (i.e. within the range of offset and offset+mPointCount).

    Therefore, the "lifetime" of the "results" is simply the lifetime of the underlying array referenced by mDPointsWS (which presumably you allocated, and therefore you should know and be able to control its lifetime). The "storage" of the "results" is on the GPU in this case -- right where you put them in the mDPointsWS array -- they haven't moved anywhere.

    The "lifetime" of the pointers is just the lifetime of the thrustAABB array, which presumably you also created, allocated, and can control the lifetime of.