I would like to deploy a huggingface text embedding model endpoint via aws sagemaker.
Here is my code so far:
import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel
# sess = sagemaker.Session()
role = sagemaker.get_execution_role()
# Hub Model configuration. <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'sentence-transformers/all-MiniLM-L12-v2', # model_id from hf.co/models
'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
py_version='py36',
transformers_version="4.6", # transformers version used
pytorch_version="1.7", # pytorch version used
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)
data = {
"inputs": ["This is an example sentence", "Each sentence is converted"]
}
result = predictor.predict(data)
print(len(result[0]))
print(result[0])
While this does deploy a endpoint successfully, it does not behave the way it should. I expect for each string in the input list to get a 1x384 list of floats as output. But instead i get 7x384 lists for each sentence. Did I maybe use the wrong pipeline?
There are two ways to deploy HuggingFace Models as Sagemaker Endpoints:
huggingface_model = HuggingFaceModel(
model_data=s3_location, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.37.0", # transformers version used
pytorch_version="2.1.0", # pytorch version used
py_version='py310', # python version used
#model_server_workers=1,
#image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04"
#env=hub
)
This is the complete reference you need: https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb
Additional Info: The handler file that will run with each request your endpoint receives:https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/main/src/sagemaker_huggingface_inference_toolkit/handler_service.py