Search code examples
pythontensorflowkerasgpu

Specifying which 3 GPUs to use (among 4 on a machine) using Keras and Tensorflow


I am performing deep learning on my machine which has 4 GPU's. During training, the third GPU is consistently lost (the error comes up "GPU lost" and the logs indicate it's this specific GPU). I am assuming it's a thermal issue and the GPU is becoming unseated.

Before I fix this hardware issue, I would like to continue using the 3 GPUs ('/gpu:0', '/gpu:1', '/gpu:3'). Is there a way to specific, in Keras, that these are the GPUs I want to use (or alternatively, ignore '/gpu:2')?

I have seen a lot on specifying GPU vs CPU usage and specifying one GPU on a multiple GPU machine but not this specific issue (isolated a number of specific GPUs).


Solution

  • You can try to use CUDA_VISIBLE_DEVICES environ

    import os
    os.environ['CUDA_VISIBLE_DEVICES']="0,1,3"
    

    Probably set this before importing keras/tf.