Search code examples
cudathrustcuda-streamscuda-graphs

CUDA graph stream capture with thrust::reduce


When I am trying to capture stream execution to build CUDA graph, call to thrust::reduce causes a runtime error cudaErrorStreamCaptureUnsupported: operation not permitted when stream is capturing. I have tried returning the reduction result to both host and device variables, and I am calling reduction in a proper stream by the means of thrust::cuda::par.on(stream). Is there any way I can add thrust functions execution to CUDA graphs?


Solution

  • Thrust's reduction operation is a blocking operation on the host side. I am assuming that you are using the result of reduction as a parameter to one of your following kernels. So that when you are capturing a CUDA graph, it cannot instantiate the graph executable because you are dependent on a variable that is on the host side but not available until the reduction kernel finishes execution. As a solution, you can try adding a host node to your graph that returns the result of the reduction.