I fine tuned a huggingface-llm-mistral-7b
model using a SageMaker JumpStartEstimator. The artifacts are stored in S3, compressed as a .tar.gz
file.
Now I'm trying to deploy said model. Using the python SDK and I run the following code:
INFERENCE_INSTANCE_TYPE = "ml.g5.2xlarge"
MODEL_ID = "huggingface-llm-mistral-7b"
MODEL_VERSION = "*"
SAGEMAKER_ROLE = "arn:aws:iam::257342474:role/AmazonSageMakerFullAccess"
endpoint_name = name_from_base(f"jumpstart-{MODEL_ID}")
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,
image_scope="inference",
model_id=MODEL_ID,
model_version=MODEL_VERSION,
instance_type=INFERENCE_INSTANCE_TYPE,
)
deploy_source_uri = script_uris.retrieve(
model_id=MODEL_ID, model_version=MODEL_VERSION, script_scope="inference"
)
base_model_uri = model_uris.retrieve(
model_id=MODEL_ID, model_version=MODEL_VERSION, model_scope="inference"
)
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
model_data="s3://path/to/model/model.tar.gz",
entry_point="inference.py",
role=SAGEMAKER_ROLE,
predictor_cls=Predictor,
name=endpoint_name,
)
base_model_predictor = model.deploy(
initial_instance_count=1,
instance_type=INFERENCE_INSTANCE_TYPE,
endpoint_name=endpoint_name,
volume_size=50,
)
And the error I get is:
OSError: [Errno 28] No space left on device
My instance should have enough RAM and disk space, I can't figure out why this error is raised.
The error seems to come from the model creation step so I tried replacing the deploy instruction by:
model.create(instance_type=INFERENCE_INSTANCE_TYPE)
And the same error is raised.
I also tried to increase volume size, use a ml.g5.12xlarge instance as well as to use a ServerlessInferenceConfig to no effect.
Could anyone provide advice on how to fix this or how to troubleshoot the source of the error ?
I recommend following steps to investigate the issue.
Solution would depends on the result of these steps.