Search code examples
huggingface-transformersamazon-sagemakerhuggingfacehuggingface-trainer

Sagemaker downloads the training image every time it runs with Hugging Face


I am training my HuggingFace transformers model on SageMaker, that spins up an image each time I submit a job:

...
Training - Downloading the training image..............................
...

This takes considerable time. Is there any way to skip that step?


Solution

  • Training Jobs are ephemeral jobs. You need the training docker image as that is where your script runs.

    If you have repetitive workloads and are looking to speed up start time, take a look at warm pools: https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html