Search code examples
tensorflowkerasgcloudgcp-ai-platform-training

Submit a Keras training job to Google cloud


I am trying to follow this tutorial: https://medium.com/@natu.neeraj/training-a-keras-model-on-google-cloud-ml-cb831341c196

to upload and train a Keras model on Google Cloud Platform, but I can't get it to work.

Right now I have downloaded the package from GitHub, and I have created a cloud environment with AI-Platform and a bucket for storage.

I am uploading the files (with the suggested folder structure) to my Cloud Storage bucket (basically to the root of my storage), and then trying the following command in the cloud terminal:

gcloud ai-platform jobs submit training JOB1 
 --module-name=trainer.cnn_with_keras 
 --package-path=./trainer 
 --job-dir=gs://mykerasstorage      
 --region=europe-north1         
 --config=gs://mykerasstorage/trainer/cloudml-gpu.yaml

But I get errors, first the cloudml-gpu.yaml file can't be found, it says "no such folder or file", and trying to just remove it, I get errors because it says the --init--.py file is missing, but it isn't, even if it is empty (which it was when I downloaded from the tutorial GitHub). I am Guessing I haven't uploaded it the right way.

Any suggestions of how I should do this? There is really no info on this in the tutorial itself.

I have read in another guide that it is possible to let gcloud package and upload the job directly, but I am not sure how to do this or where to write the commands, in my terminal with gcloud command? Or in the Cloud Shell in the browser? And how do I define the path where my python files are located?

Should mention that I am working with Mac, and pretty new to using Keras and Python.


Solution

  • The issue with the GPU is solved now, it was something so simple as, my google cloud account had GPU settings disabled and needed to be upgraded.