Because of certain VPC restrictions I am forced to use custom containers for predictions for a model trained on Tensorflow. According to the documentation requirements I have created a HTTP server using Tensorflow Serving. The Dockerfile used to build
the image is as follows:
FROM tensorflow/serving:2.4.1-gpu
# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model
Where my_model
contains the saved_model
inside a folder named 1/
.
I have then pushed this image to a Google Container Repository and then created a Model
by using Import an existing custom container
and changing the Port
to 8501. However when trying to deploy the model to an endpoint using a single compute node of type n1-standard-16 and 1 P100 GPU the deployment runs into the below error:
Failed to create session: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
I am unable to figure how this is happening. I am able to run the same docker image on my local machine and I am able to successfully get predictions by hitting the endpoint that is created: http://localhost:8501/v1/models/my_model:predict
Any help is this regard will be appreciated.
The issue has been solved by downgrading Tensorflow serving image to 2.3.0-gpu
version. According to the error context, the CUDA driver in the custom model image doesn't match the appropriate driver version in GCP AI Platform training cluster.