Am currently trying to deploy a custom model to AI platform by following https://cloud.google.com/ai-platform/prediction/docs/deploying-models#gcloud_1. which is based on a combination of the pre-trained model from 'Pytorch' and 'torchvision.transform'. Currently, I keep getting below error which happens to be related to 500MB constraint on custom prediction.
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to experience errors, please contact support.
Setup.py
from setuptools import setup
from pathlib import Path
base = Path(__file__).parent
REQUIRED_PACKAGES = [line.strip() for line in open(base/"requirements.txt")]
print(f"\nPackages: {REQUIRED_PACKAGES}\n\n")
# [torch==1.3.0,torchvision==0.4.1, ImageHash==4.2.0
# Pillow==6.2.1,pyvis==0.1.8.2] installs 800mb worth of files
setup(description="Extract features of a image",
author=,
name='test',
version='0.1',
install_requires=REQUIRED_PACKAGES,
project_urls={
'Documentation':'https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines#tensorflow',
'Deploy':'https://cloud.google.com/ai-platform/prediction/docs/deploying-models#gcloud_1',
'Ai_platform troubleshooting':'https://cloud.google.com/ai-platform/training/docs/troubleshooting',
'Say Thanks!': 'https://medium.com/searce/deploy-your-own-custom-model-on-gcps-ai-platform-
7e42a5721b43',
'google Torch wheels':"http://storage.googleapis.com/cloud-ai-pytorch/readme.txt",
'Torch & torchvision wheels':"https://download.pytorch.org/whl/torch_stable.html "
},
python_requires='~=3.7',
scripts=['predictor.py', 'preproc.py'])
Steps taken: Tried adding ‘torch’ and torchvision directly to ‘REQUIRED_PACKAGES’ list in setup.py file in order to provide PyTorch + torchvision as a dependency to be installed while deployment. I am guessing, Internally Ai platform downloads PyPI package for PyTorch which is +500 MB, this results in the failure of our model deployment. If I just deploy the model with 'torch' only and it seems to be working (of course throws error for not able to find library 'torchvision')
File size
In total, the model dependencies should be less than 250MB but still, keep getting this error. Have also tried to use the torch and torchvision provided from Google mirrored packages http://storage.googleapis.com/cloud-ai-pytorch/readme.txt, but same memory issue persists. AI platform is quite new for us and would like some input from professional’s.
GCP CLI Input:
My environment variable:
BUCKET_NAME= “something”
MODEL_DIR=<>
VERSION_NAME='v6'
MODEL_NAME="something_model"
STAGING_BUCKET=$MODEL_DIR<>
# TORCH_PACKAGE=$MODEL_DIR"package/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
# TORCHVISION_PACKAGE=$MODEL_DIR"package/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCH_PACKAGE=<>
TORCHVISION_PACKAGE=<>
CUSTOM_CODE_PATH=$STAGING_BUCKET"imt_ai_predict-0.1.tar.gz"
PREDICTOR_CLASS="predictor.MyPredictor"
REGION=<>
MACHINE_TYPE='mls1-c4-m2'
gcloud beta ai-platform versions create $VERSION_NAME \
--model=$MODEL_NAME \
--origin=$MODEL_DIR \
--runtime-version=2.3 \
--python-version=3.7 \
--machine-type=$MACHINE_TYPE \
--package-uris=$CUSTOM_CODE_PATH,$TORCH_PACKAGE,$TORCHVISION_PACKAGE \
--prediction-class=$PREDICTOR_CLASS \
GCP CLI Output:
**[1] global**
[2] asia-east1
[3] asia-northeast1
[4] asia-southeast1
[5] australia-southeast1
[6] europe-west1
[7] europe-west2
[8] europe-west3
[9] europe-west4
[10] northamerica-northeast1
[11] us-central1
[12] us-east1
[13] us-east4
[14] us-west1
[15] cancel
Please enter your numeric choice: 1
To make this the default region, run `gcloud config set ai_platform/region global`.
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: **Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to experience errors, please contact support.**
My finding: Have found articles of people struggling in same ways for PyTorch package and made it work by installing torch wheels on the GCS (https://medium.com/searce/deploy-your-own-custom-model-on-gcps-ai-platform- 7e42a5721b43). Have tried the same approach with torch and torchvision but no luck till now and waiting response from "cloudml-feedback@google.com cloudml-feedback@google.com". Any help on getting custom torch_torchvision based custom predictor working on AI platform that will be great.
Got this fixed by a combination of few things. I stuck to 4gb CPU MlS1 machine and custom predictor routine (<500MB).
REQUIRED_PACKAGES = [line.strip() for line in open(base/"requirements.txt")] +\
['torchvision==0.5.0', 'torch @ https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl']
import json
json.dump(your data to send to predictor class)
from torch import zeros,load
your code
Haven't tested different types of serializing object for the trained model, could be a difference there as well to which one (torch.save, pickle, joblib etc) is memory saving.
Found this link for those whose organization is a partner with GCP might be able to request more quota (am guessing from 500MB to 2GB or so). Didn't had to go in this direction as my issue was resolved and other poped up lol. https://cloud.google.com/ai-platform/training/docs/quotas