google-cloud-platform tensorflow-serving google-cloud-ml google-ai-platform

Cannot create an Endpoint with Unified Cloud AI Platform custom containers

Because of certain VPC restrictions I am forced to use custom containers for predictions for a model trained on Tensorflow. According to the documentation requirements I have created a HTTP server using Tensorflow Serving. The Dockerfile used to build the image is as follows:

FROM tensorflow/serving:2.4.1-gpu

# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model

Where my_model contains the saved_model inside a folder named 1/.

I have then pushed this image to a Google Container Repository and then created a Model by using Import an existing custom container and changing the Port to 8501. However when trying to deploy the model to an endpoint using a single compute node of type n1-standard-16 and 1 P100 GPU the deployment runs into the below error:

Failed to create session: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

I am unable to figure how this is happening. I am able to run the same docker image on my local machine and I am able to successfully get predictions by hitting the endpoint that is created: http://localhost:8501/v1/models/my_model:predict

Any help is this regard will be appreciated.

Solution

The issue has been solved by downgrading Tensorflow serving image to 2.3.0-gpu version. According to the error context, the CUDA driver in the custom model image doesn't match the appropriate driver version in GCP AI Platform training cluster.