CUDA FFT plan reuse across multiple 'overlapped' CUDA Stream launches

I'm in trying to improve the performance of my code using asynchronous memory transfer overlapped with GPU computation.

Formerly I had a code where I created an FFT plan, and then make use of it multiple times. In such situation the time invested in creating the CUDA FFT plan is negligible althought according to this earlier post it could be quite significant.

Now that I move to streams, what I'm doing is creating the "same" plan "multiple times" and then setting the CUDA FFT stream. According to the answers given by some of you in this other post this is wasteful. But, is there any other way to do it?

NOTE: I'm acquiring the data in real-time, so launching a "batch" CUDA FFT is out of the question. What I'm doing is to create and lauch a new CUDA stream as a result of a complete pulse transmission.

NOTE 2: I was also considering using a "pool" of "CUDA Streams/FFT Plans" instead but I don't think that would be an elegant, sensible solution, any thoughts?

Is there otherwise a way to "copy" an "existent" fft plan before I assign the CUDA Stream?

Thanks guys!/gals? Hopefully meet some of you in San Jose. =)

Omar

Solution

What I'm doing is to create and lauch a new CUDA stream as a result of a complete pulse transmission.

Re-use the streams, rather than creating a new stream each time. Then you can re-use the plan created for that stream ahead of time, and you have no need to recreate the "same" plan on-the-fly.

Perhaps this is what you mean by the pool of streams method. Your criticism is that it is not "elegant" or "sensible". I have no idea what that means. Stream re-use in pipelined algorithms is a common tactic, if for no other reason than to avoid the cudaStreamCreate overhead (whatever it may be, large or small).

A cufft plan has a stream associated with it. You cannot copy a plan without the stream association. A plan is an opaque container.