Search code examples
pythonamazon-web-servicesdeploymentamazon-sagemakerawsdeploy

Is there a faster way to update sagemaker endpoint when running in script mode using inference.py


I am iterating on a model inference deployment using script mode in Sagemaker (currently running in local mode) and update my inference script often. Every time I update the inference.py entry point script, I need to recreate the model instance again like this:

model_instance = PyTorchModel(

    model_data=model_tar_path,
    role=role,
    source_dir="code",
    entry_point="inference.py",
    framework_version="1.8",
    py_version="py3"
)

and then call

predictor = model_instance.deploy(
    initial_instance_count=1,
    endpoint_name='some_name',
    instance_type=instance_type,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer())

over and over every time I change something. This take super long each time because it basically restarts a new docker container (remember running locally) and waits to install all dependencies before I can start to do something with it. And if at all, there is any error, I need to do the whole thing all over again.

I'd like to explore any possible way I can utilize the update_endpoint functionality that allows me to essentially redeploy the endpoint within the same container, without having to recreate a new container every time and then waiting for all dependency installations etc.


Solution

  • SageMaker local mode is designed to imitate the hosted environment. As such, every time you deploy/update a new container is run.

    For faster development, I usually bake all the packages I can into the container which stops the need for installation on every deploy.

    I.e. You can extend the SageMaker PyTorch container and bake your packages into it instead of using a requirements.txt. You can then push the image to ECR and specify it in the PyTorchModel

    https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html

    In your PyTorchModel:

    model_instance = PyTorchModel(
        image_uri = <YourImageECRURI>,
        model_data=model_tar_path,
        role=role,
        source_dir="code",
        entry_point="inference.py",
        framework_version="1.8",
        py_version="py3"
    )