google-cloud-platform pytorch google-cloud-vertex-ai

Vertex AI Custom Container deployment

I have a simple application that have PyTorch model for predicting emotions in the text. The model get's downloaded inside the container when it starts to work. Unfortunately the deployment in vertex ai fails everytime with the message:

Failed to deploy model "emotion_recognition" to endpoint "emotions" due to the error: Error: model server never became ready. Please validate that your model file or container configuration are valid.

Here is my Dockerfile:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim

COPY requirements.txt ./requirements.txt
RUN pip install -r requirements.txt

WORKDIR /usr/src/emotions
COPY ./schemas/ /emotions/schemas
COPY ./main.py /emotions
COPY ./utils.py /emotions

ENV PORT 8080
ENV HOST "0.0.0.0"

WORKDIR /emotions

EXPOSE 8080

CMD ["uvicorn", "main:app"]

Here's my main.py:

from fastapi import FastAPI,Request
from utils import get_emotion
from schemas.schema import Prediction, Predictions, Response

app = FastAPI(title="People Analytics")

@app.get("/isalive")
async def health():
    message="The Endpoint is running successfully"
    status="Ok"
    code = 200
    response = Response(message=message,status=status,code=code)
    return response

@app.post("/predict",
            response_model=Predictions,
            response_model_exclude_unset=True)

async def predict_emotions(request: Request):

    body = await request.json()
    print(body)
    instances = body["instances"]
    print(instances)
    print(type(instances))
    instances = [x['text'] for x in instances]
    print(instances)

    outputs = []

    for text in instances:
       emotion = get_emotion(text)
       outputs.append(Prediction(emotion=emotion))

    return Predictions(predictions=outputs)

I cannot see the cause of error in cloud logging so I am curious about the reason. Please check if my health/predict routes are correct for vertex ai or there's something else I have to change.

Solution

I would like to recommend that when deploying an endpoint you should enable the use of logs so that you will get more meaningful information from logs.

This issue could be due to different reasons:

Make sure that the container configuration port is using port 8080 or not.Vertex AI sends liveness checks, health checks, and prediction requests to this port on the container. Your container's HTTP server must listen for requests on this port.
Make sure that you have the required permissions.For this you can follow this gcp documention and also validate that the account you are using has enough permissions to read your project's GCS bucket
Vertex AI has some quota limits to verify this you can also follow this gcp documention.
As per documentation, it should select the default Prediction and health route if you did not specify the path.

If any of the suggestions above work, it’s a requirement to contact GCP Support by creating a support case to fix it. It’s impossible for the community to troubleshoot it without using internal GCP resources