Search code examples
cudamulti-gpu

Cuda 4 inter-GPU synchronization


Did anyone successfully test this feature on Cuda4 ? It seems I can't have it worked correctly, not sure if it's a bug in my code or the feature is not fully implemented yet.

For example, I want to do a kernel call after copying a memory from 1 gpu to another. Stream 0, mem0 is created on gpu0 stream 1,mem1 is created on gpu1. Both are Fermi Tesla (C2070)

cudaMemcpyAsync( mem1, mem0, size, cudaMemcpyDefault, stream0 );
cudaEventRecord(P2Pevent, stream0);

cudaStreamWaitEvent(stream1, P2Pevent, 0);
cudaKernel<<<block,thread,0,stream1>>>(mem1);

Sometime it seems the memory is not ready for the kernel yet so the result is incorrect. If I add an cudaDeviceSynchronize on Gpu0 then it works fine.

Thanks


Solution

  • It's a bug in my code ! It does work correctly Thanks guys