concurrency cuda synchronization cusolver

huge use of cudaFree from cuSparse tridiagonal solver

I am using cusparseDgtsv_nopivot function to solve a tridiagonal system of equation. the output is correct but the function does not make proper use of cuda multi-streaming. The nvvp profiler shows that although every call to this solver is in a different stream they never overlap. I thought on implicit synchronization and found out through nvvp the library function has a lot of calls to cudaFree in between. Is there a way to avoid this implicit synchronization?

Pseudocode of the use of cusparse:

create array of streams[];
create cusparse handle;
for (int i=0;i<Nsystem;i++){
 cusparseSetStream(handle,stream[i]);
 cusparseDgtsv_nopivot(handle, var for linear system i);
}
destroy cusaprse handle;

PS: similar cudafree issue was raised and solved dealing with matrices: here.

Solution

The really short answer is no. There is presently no way to modify the synchronisation behaviour of cudaFree within the runtime API.

So if, as you hypothesize, the cause of the problem is internal use of malloc and free with cuSolver, then the only thing to do would be report your user case to NVIDIA and see whether they can either propose a workaround, or provide an "expert" version of the routine where the caller manages scratch space explicitly.