I'm using the latest version CUDA 5.5 and the new CUBLAS has a stateful taste where every function needs a cublasHandle_t
e.g.
cublasHandle_t handle;
cublasCreate_v2(&handle);
cublasDgemm_v2(handle, A_trans, B_trans, m, n, k, &alpha, d_A, lda, d_B, ldb, &beta, d_C, ldc);
cublasDestroy_v2(handle);
Is it a good practice to reuse this handle instance as much as possible like some sort of a Session
or the performance impact would be so small that it makes more sense to lower code complexity by having short-living handle instances and therefore create/destroy it continuously?
I think it is a good practice for two reasons: