Search code examples
dockertensorflowgoogle-cloud-platformtensorflow-servinggoogle-cloud-ml

Unable to create a version in Cloud AI Platform using custom containers for prediction


Because of certain VPC restrictions I am forced to use custom containers for predictions for a model trained on Tensorflow. According to the documentation requirements I have created a HTTP server using Tensorflow Serving. The Dockerfile used to build the image is as follows:

FROM tensorflow/serving:2.3.0-gpu

# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model

Where my_model contains the saved_model inside a folder named 1/.

I have then pushed the container image to Artifact Registry and then created a Model. To create a Version I have selected Customer Container on the Cloud Console UI and and added the path to the Container Image. I have then mentioned the Prediction route and the Health route to be /v1/models/my_model:predict and have changed the Port to 8501. I have also selected the machine type to be a single compute node of type n1-standard-16 and 1 P100 GPU and kept scaling Auto scaling.

After clicking on Save I can see the Tensorflow Server starting and while viewing the logs we can see the following messages:

Successfully loaded servable version {name: my_model version: 1}

Running gRPC ModelServer at 0.0.0.0:8500

Exporting HTTP/REST API at:localhost:8501

NET_LOG: Entering the event loop

However after about 20-25 minutes the version creation just stops throwing out the following error:

Error: model server never became ready. Please validate that your model file or container configuration are valid.

I am unable to figure why this is happening. I am able to run the same docker image on my local machine and I am able to successfully get predictions by hitting the endpoint that is created: http://localhost:8501/v1/models/my_model:predict

Any help is this regard will be appreciated.


Solution

  • Answering this myself after working with the Google Cloud Support Team to figure out the error.

    Turns out the port I was creating a Version on was conflicting with the Kubernetes deployment on Cloud AI Platform's side. So I changed the Dockerfile to the following and was able to successfully run Online Predictions on both Classic AI Platform and Unified AI Platform:

    FROM tensorflow/serving:2.3.0-gpu
    
    # Set where models should be stored in the container
    ENV MODEL_BASE_PATH=/models
    RUN mkdir -p ${MODEL_BASE_PATH}
    
    # copy the model file
    ENV MODEL_NAME=my_model
    COPY my_model /models/my_model
    
    EXPOSE 5000
    
    EXPOSE 8080
    
    CMD ["tensorflow_model_server", "--rest_api_port=8080", "--port=5000", "--model_name=my_model", "--model_base_path=/models/my_model"]