I have two machines, machine 1 has GPUs and the machine2 only has a CPU. I want to know if the two machines can use Multi-worker training in TensorFlow, that is, during the distributed training, machine1 uses GPUs and machine2 uses CPU.
The version of Tensorflow is 2.1.0
The answer is no. When I do distribute deep learning followed this tutorial:
https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras
There are some errors happened:
tensorflow.python.framework.errors_impl.InternalError: Collective Op CollectiveBcastSend: Broadcast(1) is assigned to device /job:worker/replica:0/task:0/device:GPU:0 with type GPU and group_key 1 but that group has type CPU [Op:CollectiveBcastSend]
After I set machine1 to use CPU by code:
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
Training will run successfully using the CPUs of both machines.