I am performing deep learning on my machine which has 4 GPU's. During training, the third GPU is consistently lost (the error comes up "GPU lost" and the logs indicate it's this specific GPU). I am assuming it's a thermal issue and the GPU is becoming unseated.
Before I fix this hardware issue, I would like to continue using the 3 GPUs ('/gpu:0', '/gpu:1', '/gpu:3'). Is there a way to specific, in Keras, that these are the GPUs I want to use (or alternatively, ignore '/gpu:2')?
I have seen a lot on specifying GPU vs CPU usage and specifying one GPU on a multiple GPU machine but not this specific issue (isolated a number of specific GPUs).
You can try to use CUDA_VISIBLE_DEVICES environ
import os
os.environ['CUDA_VISIBLE_DEVICES']="0,1,3"
Probably set this before importing keras/tf.