Search code examples
tensorflowskflow

Tensorflow contrib.learn.Estimator multi-GPU


In order to use the contrib.learn.Estimator for multi-GPU training, I am attempting to specify GPU assignments in my model_fn.

In pseudo-code:

def model_fn(X, y):
    with tf.device('/gpu:1'):
       ... various tensorflow ops for model ...

       return predictions, loss, train_op

Everything works fine without the tf.device('/gpu:1') call, but with it I encounter the following error:

InvalidArgumentError (see above for traceback): Cannot assign a device to
node 'save/ShardedFilename_1': Could not satisfy explicit device
specification '/device:GPU:1' because no supported kernel 
for GPU devices is available.

I do not believe that I am adding the offending op to the graph myself, but rather that it is injected through the Estimator's snapshot functionality.

I believe that the solution is to set allow_soft_placement=True so that non GPU functions will fall to CPU, but it's not obvious to me how that exposed when dealing with contrib.learn.Estimator.

I see that the option is usually set in ConfigProto & passed to the session, but I've been using the Estimator's functionality to manage the session for me. Should I be taking control of the session creation, or am I missing a parameter somewhere to accomplish this?

Many thanks in advance for any advice.


Solution

  • Along with Estimator leaving contrib in Tensorflow 1.0 this is fixed.