Search code examples
google-cloud-platformgoogle-cloud-dataflowgoogle-cloud-kms

Reading a KMS encrypted file from Google Cloud Dataflow


I went through this Google Cloud Documentation, which mentions that :-

Dataflow can access sources and sinks that are protected by Cloud KMS keys without you having to specify the Cloud KMS key of those sources and sinks, as long as you are not creating new objects.

I have a few questions regarding this:

Q.1. Does this mean we don't need to decrypt the encrypted source file within our Beam code ? Does Dataflow has this functionality built-in?

Q.2. If the source file is encrypted, will the output file from Dataflow be encrypted by default with the same key (let us say we have a symmetric key) ?

Q.3. What are the objects that are being referred here?

PS: I want to read from an encrypted AVRO file placed in the GCS bucket, apply my Apache Beam Transforms from my code and write an encrypted file back to the bucket.


Solution

  • Cloud Dataflow is a fully managed service where if encryption is not specified, it automatically applies Cloud KMS encryption. Cloud KMS is cloud hosted key management service that can manage both symmetric and asymmetric cryptographic keys.

    • When Cloud KMS is used with Cloud Dataflow, it allows you to encrypt the data that is to be processed in the Dataflow pipeline. Using Cloud KMS, the data that is temporarily stored in temporary storage like Persistent Disk can be encrypted to get end-to-end protection of data. You need not to decrypt the source file within the beam code as data from the sources is encrypted and decryption will be done automatically by the Dataflow.

    • If you are using a symmetric key, then a single key can be used for both encryption and decryption of the data which is managed by Cloud KMS stored in ciphertext. If you are using an asymmetric key, then a public key will be used to encrypt the data and a private key will be used to decrypt the data. You need to provide Cloud KMS CryptoKey Encrypter/Decrypter role to the Dataflow service account before performing encryption and decryption. Cloud KMS automatically determines the key for decryption based on the provided ciphertext so no need to take extra care for decryption.

    • The objects that you have mentioned which are encrypted by the Cloud KMS can be tables in BigQuery, files in Cloud Storage and different data in the sources and sinks.

    For more information you can check this blog.