Search code examples
tensorflowtensorflow-datasetstensorflow-estimator

Multi gpu training with estimators


in this link https://www.tensorflow.org/beta/tutorials/distribute/multi_worker_with_estimator they say that when using Estimator for multi-worker training, it is necessary to shard the dataset by the number of workers to ensure model convergence.By multi-worker they mean multiple gpus in one system or distributed training? i have 2 gpus in one system, do i have to shard the dataset?


Solution

  • No you don't - multiple workers refer to a cluster of machines.

    For single machine with multiple GPUs you don't need to shard it.

    This tutorial explains the MirroredStrategy which you want for multiple GPUs: https://www.tensorflow.org/beta/tutorials/distribute/keras

    For different distributed strategies for different setups you can refer here for more information: https://www.tensorflow.org/beta/guide/distribute_strategy#types_of_strategies