Search code examples
google-cloud-platformgoogle-cloud-dataprocgoogle-hadoop

Google Hadoop Filesystem Encryption


In normal operation one can provide encryption keys to the google storage api to encrypt a given bucket/blob: https://cloud.google.com/compute/docs/disks/customer-supplied-encryption

Is this possible for the output of spark/hadoop jobs "on the fly"?

Say we wanted to encrypt the output of a spark write

 df.write().format("json").save("gs:///somebucket/output");

In https://storage.googleapis.com/hadoop-conf/gcs-core-default.xml there is no way to specify an encryption key.

Is this possible to do?


Solution

  • If you're asking if customer-supplied encryption keys are currently available on Cloud Dataproc, the answer is no. Here is a list of current products options for encryption at rest at Google.

    If you were just looking to encrypt the output of a Spark write, you could still encrypt this at the application layer using Google's Cloud KMS. Here's a codelab for doing so in Google Cloud Storage (which looks like what you're doing with the command above). Note that customer content is encrypted at rest on Google cloud platform by default at the storage layer, so this is another layer of protection.