I expected that GTX 680 (which is one of the latest version of GPUs) is capable of concurrent data transfer (concurrent data transfer in both direction). But when I run cuda SDK "Device Query", the test result of the term "Concurrent copy and execution" is "Yes with 1 copy engine", and it means that the GPU can not do concurrent data transfer.
I wonder if my testing result also happens to you? And can you share with me which devices are capable of concurrent data transfer?
Thanks!
Dual copy engines are available on Tesla cards and modules:
http://www.nvidia.com/object/why-choose-tesla.html
http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-C2075.pdf
Also, some Quadro models provide dual copy engines, e.g.: