Search code examples
google-cloud-platformgoogle-bigquerypermissionsgoogle-cloud-dataflowidentity-management

GCP Dataflow: pipe data from a BigQuery instance in a different project


I have a GCP Dataflow job that pulls data from one GCP project (Source) and pushes it to a different project (Dest). The job will run inside the Dest project.

The problem is that when I try running this query, I get:

  "error": {
"code": 403,
"message": "Access Denied: Project Dest: User does not have bigquery.jobs.create permission in project Dest.",
"errors": [
  {
    "message": "Access Denied: Project Dest: User does not have bigquery.jobs.create permission in project Dest.",
    "domain": "global",
    "reason": "accessDenied"
  }
],
"status": "PERMISSION_DENIED"

} }

I'm running the query via Dataflow with this command:


            | "ReadBQView"
            >> beam.io.ReadFromBigQuery(
                query=BIGQUERY_QUERY, use_standard_sql=True
            )

The BIGQUERY_QUERY string is a valid SQL statement with the following FROM clause:

  FROM
    `source.logging.endpoints`

where source is the name of the source project where BigQuery lives.

I'm impersonating a Service account to run this job. I've tried granting it the BigQuery Admin role in the Dest project just to see if that did anything, but I'm still getting the same error.

What's going wrong here? Why isn't the ReadFromBigQuery function in my Dataflow job running in the Source project?


Solution

  • You have to create a Service Account from the Dest project, example sa-dataflow.

    The Service Account should be created in the project that run the job.

    Give the following roles to this Service Account :

    • BigQuery Data Viewer/Bigquery Job User in the Source project
    • BigQuery Data Editor/Bigquery Job User in the Dest project

    I used predefined roles for this example because it's simpler and help you to solve your problem more easily, but you can also use custom roles instead, in order to respect the least privilege principle.

    Then, when you run the Beam/Dataflow Job, pass the Service Account email as program argument :

    python -m folder.main \
        --runner=DataflowRunner \
        --service_account_email=sa-dataflow@<project-id>.iam.gserviceaccount.com
        ....