Unable to run a model using HuggingFace Inference Endpoints

I am able to make successful requests using the free endpoint, but when using Inference Endpoints, I get 404 response. Here is the relevant piece of code:

mode = 'paid'                                              # works if 'free'
model_id = "sentence-transformers/all-MiniLM-L6-v2"
headers = {"Authorization": f"Bearer {HUGGINGFACE_TOKEN}"}

if mode == 'free':
    # This works
    api_url = f"https://api-inference.huggingface.co/pipeline/feature-extraction/{model_id}"
else:
    api_url = f"https://xxxxxxxxxxxxxxxxx.us-east-1.aws.endpoints.huggingface.cloud/{model_id}"

def get_embeddings(texts):
    response = requests.post(api_url, headers=headers, json={"inputs": texts, "options":{"wait_for_model":True}})

In the web UI, the endpoint is shown as running and I can test it there no problem.

What am I missing?

Solution

As mentioned in the comments;

The URL doesn't have a /{model_id} endpoint.
The task section should be filled correctly according to your needs.

After removing the /{model_id}, we faced a 400, list indices must be integers or slices, not str message. Which was caused by the faulty task. Instead of getting the embeddings, it was trying to get the similarities between strings in a list. After changing the task to embeddings, the model successfully generated embeddings from a single string. For a detailed tutorial that covers the deployment process, please see Getting Started with Hugging Face Inference Endpoints .