Search code examples
tensorflowamazon-sagemakerdistributed-traininghorovod

Can Horovod with TensorFlow work on non-GPU instances in Amazon SageMaker?


I want to perform distributed training on Amazon SageMaker. The code is written with TensorFlow and similar to the following code where I think CPU instance should be enough:  https://github.com/horovod/horovod/blob/master/examples/tensorflow_word2vec.py

Can Horovod with TensorFlow work on non-GPU instances in Amazon SageMaker?


Solution

  • Yeah you should be able to use both CPU's and GPU's with Horovod on Amazon SageMaker. Please follow the below example for the same

    https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/tensorflow_script_mode_horovod/tensorflow_script_mode_horovod.ipynb