Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-dataflowapache-beamservice-accounts

Cloud Dataflow job reading from one Bigquery project and writing to another BigQuery project


I'm implementing a Cloud Dataflow job on GCP that needs to deal with 2 GCP projects. Both input and output are Bigquery partitionned tables. The issue I'm going through now is that I must read data from a project A and write it into a project B.

I havent seen anything related to cross project service accounts and I can't give Dataflow two different credential key either which is a bit annoying ? I don't know if someone else went through that kind of architecture or how you dealt with it.


Solution

  • I think you can accomplish this with the following steps:

    1. Create a dedicated service account in the project running the Dataflow job.
    2. Grant the service account the Dataflow Worker and BigQuery Job User roles. The service account might need additional roles based on the full resource needs of the Dataflow job.
    3. In Project A, grant the service account the BigQuery Data Viewer role to either the entire project or to specific datasets.
    4. In Project B, grant the service account the BigQuery Data Editor role to either the entire project or to specific datasets.
    5. When you start the Dataflow job, override the service account pipeline option supplying the new service account.