Search code examples
pythonpython-2.7tensorflowredhat

Can TensorFlow run with multiple CPUs (no GPUs)?


I'm trying to learn distributed TensorFlow. Tried out a piece code as explained here:

with tf.device("/cpu:0"):
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))

with tf.device("/cpu:1"):
    y = tf.nn.softmax(tf.matmul(x, W) + b)
    loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

Getting the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul': Operation was explicitly assigned to /device:CPU:1 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:CPU:1"](Placeholder, Variable/read)]]

Meaning that TensorFlow does not recognize CPU:1.

I'm running on a RedHat server with 40 CPUs (cat /proc/cpuinfo | grep processor | wc -l).

Any ideas?


Solution

  • Following the link in the comment:

    Turns out the session should be configured to have device count > 1:

    config = tf.ConfigProto(device_count={"CPU": 8})
    with tf.Session(config=config) as sess:
       ...
    

    Kind of shocking that I missed something so basic, and no one could pinpoint to an error which seems too obvious.

    Not sure if it's a problem with me or the TensorFlow code samples and documentation. Since it's Google, I'll have to say that it's me.