Search code examples
encryptiongoogle-cloud-platformgoogle-cloud-storagegoogle-cloud-ml-engine

Providing decryption key with gcloud jobs submit training


I have succesfully trained my first network with the Google Cloud ML engine, and now I am trying to make the setup a bit more secure by providing my own encryption key for encrypting the data. As explained in the manual I have now copied my data to the Cloud Storage with my own custom encryption key, instead of storing it there unencrypted.

However, now my setup (obviously!) broke, as the Python code I submit to the ML Engine cannot decrypt the files. I am expecting an option like --decrypt-key to gcloud ml-engine jobs submit training, but I cannot find such an option. How to provide this key such that my code can decrypt the data?


Solution

  • Short answer: You should not pass the decryption key into the training job. Instead see https://cloud.google.com/kms/docs/store-secrets

    Long answer: While you could technically make the decryption key a flag that gets passed through the Training Job definition, this would expose it to anyone with access to List Training Jobs. You should instead place the key in the Google Cloud Key Management Service and give the service account running the ML training job permission to fetch the key from there.

    You can determine the service account that runs the training job by following the procedure listed at https://cloud.google.com/ml-engine/docs/how-tos/working-with-data#using_a_cloud_storage_bucket_from_a_different_project

    Edit: Also note what Alexey says in the comment below; Tensorflow won't currently be able to read and decrypt the files directly from GCS, you'll need to copy them to local disk on every worker with the keys supplied to gsutil cp.