Search code examples
python-3.xboto3amazon-sagemaker

AWS Sagemaker Training - Container Arguments - Boto3 API


Need guidance on passing command line arguments for Sagemaker training job using Boto3 API. Please find my docker file

FROM public.ecr.aws/ubuntu/ubuntu:22.04

LABEL version="2.0"

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         build-essential \
         python3-dev \
         python3-pip \
         python3-setuptools \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*
RUN python3.10 -m pip install pip --upgrade && pip install --upgrade cython
RUN ln -s /usr/bin/python3 /usr/bin/python
COPY requirements.txt .
RUN pip --no-cache-dir install -r requirements.txt

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml/code/:${PATH}"
ENV PYTHONPATH="/opt/ml/code/:${PYTHONPATH}"

COPY src/ /opt/ml/code/
WORKDIR /opt/ml/code/

ENTRYPOINT [ "python", "/opt/ml/code/entry_point.py" ]

The entry_point.py script is as below

parser = argparse.ArgumentParser()
parser.add_argument("--mode", type=str, required=True)
parser.add_argument("--region", type=int)

args = parser.parse_args()

if args.mode == "inference":
        run_inference(args.region_id)
    elif args.mode == "training":
        run_training(args.region_id)
    else:
        raise ValueError(f"Unknown mode: {args.mode}")

The image has been published to AWS ECR. Now using boto3 API call as below to start the job

session = boto3.Session(profile_name='algoprod')
client = session.client('sagemaker', region_name='us-east-1')
training_job_name = 'sagemaker-training-demo'
resp = client.create_training_job(
                    TrainingJobName=training_job_name,
                    RoleArn="xxxx",
                    AlgorithmSpecification={
                            'TrainingImage': "image:latest",
                            'TrainingInputMode': "File",
                            'ContainerArguments': [
                                    '--mode training',
                                    '--region_id 1',
                             ]
    )

    print(resp)

Above API call using boto3 successfully initiate the Sagemaker training in AWS but the training job is failing with following error message

entry_point.py: error: the following arguments are required: --mode

mode has been passed through ContainerArguments as per the guidance in Boto3 documentation https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_training_job.html

Please advice


Solution

  • Figured out how it should be passed.

    'ContainerArguments': ['--mode', 'training','--region_id', 1]