I'm trying to deploy a TorchServe instance on Google Vertex AI platform but as per their documentation (https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#response_requirements), it requires the responses to be of the following shape:
{
"predictions": PREDICTIONS
}
Where PREDICTIONS is an array of JSON values representing the predictions that your container has generated.
Unfortunately, when I try to return such a shape in the postprocess()
method of my custom handler, as such:
def postprocess(self, data):
return {
"predictions": data
}
TorchServe returns:
{
"code": 503,
"type": "InternalServerException",
"message": "Invalid model predict output"
}
Please note that data
is a list of lists, for example: [[1, 2, 1], [2, 3, 3]]. (Basically, I am generating embeddings from sentences)
Now if I simply return data
(and not a Python dictionary), it works with TorchServe but when I deploy the container on Vertex AI, it returns the following error: ModelNotFoundException
. I assumed Vertex AI throws this error since the return shape does not match what's expected (c.f. documentation).
Did anybody successfully manage to deploy a TorchServe instance with custom handler on Vertex AI?
Actually, making sure that the TorchServe processes correctly the input dictionary (instances) solved the issue. It seems like what's on the article did not work for me.