When I am trying to capture stream execution to build CUDA graph, call to thrust::reduce
causes a runtime error cudaErrorStreamCaptureUnsupported: operation not permitted when stream is capturing
. I have tried returning the reduction result to both host and device variables, and I am calling reduction in a proper stream by the means of thrust::cuda::par.on(stream)
. Is there any way I can add thrust
functions execution to CUDA graphs?
Thrust's reduction operation is a blocking operation on the host side. I am assuming that you are using the result of reduction as a parameter to one of your following kernels. So that when you are capturing a CUDA graph, it cannot instantiate the graph executable because you are dependent on a variable that is on the host side but not available until the reduction kernel finishes execution. As a solution, you can try adding a host node to your graph that returns the result of the reduction.