I have a headless workstation running Ubuntu 12.04 server and recently installed new Tesla C2070 card, but when running the examples from the CUDA SDK, I get the following error:
NVIDIA_GPU_Computing_SDK/C/bin/linux/release% ./reduction
[reduction] starting...
Using Device 0: Tesla C2070
Reducing array of type int
16777216 elements
256 threads (max)
64 blocks
reduction.cpp(473) : cudaSafeCallNoSync() Runtime API error 39 : uncorrectable ECC error encountered.
Actually, this error occurs with all other examples except "deviceQuery".
I'm using kernel 3.2.0, nvidia driver 295.41 and Cuda 4.2.9.
After a lot of searching found a suggestion to disable the ecc support by:
nvidia-smi -g 0 --ecc-config=0
which worked. But the question is how reliable will be the GPU computing with disabled ecc support?
Any advice, suggestion or solution will be highly appreciated.
-Konstantin
I'm wondering if this may be some sort of compatibility issue, rather than a bad card. I'm suffering from the same problem with a Tesla C2075, same Ubuntu version. We contacted nVidia and they told us that double-bit ECC errors (as seen using nvidia-smi -q in linux) meant that the card was probably broken. We obtained a replacement, but it has exactly the same issues.
It seems unlikely that both the boards I have had are broken in the same way, so we're going to try it in another machine if we can find a suitable one.
I'll post anything interesting that we learn.