Search code examples
amazon-web-servicespytorchamazon-sagemaker

What is the difference between sagemaker-pytorch-training-toolkit and sagemaker-training-toolkit in SageMaker?


When porting PyTorch code / models to SageMaker, which one should we use:

PyTorch Training Toolkit (https://github.com/aws/sagemaker-pytorch-training-toolkit/) or SageMaker Training Toolkit (https://github.com/aws/sagemaker-training-toolkit)? What's the difference when using these toolkits?


Solution

  • The SageMaker PyTorch Training Toolkit repository used to be the repository for the Sagemaker Pytorch Training Containers, and similarly the SageMaker PyTorch Inference Toolkit was the repository for the SageMaker PyTorch Inference containers.

    At some point, AWS has started to directly use the DockerFiles of the Deep Learning containers from the AWS Deep Learning Containers repository so the repositories above were renamed because now AWS has used them to build a library that gets installed into the DL containers to make them SageMaker-compatible for training.

    Example: From here https://github.com/aws/sagemaker-pytorch-training-toolkit/blob/master/setup.py Example of building a package that then gets installed in the DL container here: https://github.com/aws/deep-learning-containers/blob/1596489c9002cea08f8a2a7d2f4642c4b3727d52/pytorch/training/docker/1.6.0/py3/Dockerfile.cpu#L112