Search code examples
google-cloud-platformgoogle-cloud-dataflowgoogle-cloud-iam

Need to access storage bucket of project B in project A through a data flow job in google cloud (data flow job failing)


I am running a data flow job in project A where I have a requirement of accessing storage bucket in project B. I have requested the admin to add the service accounts of project A in project B and have given the required permission As you can see below, I have provided the screen shot of project B whose storage bucket I need to access in project A and run data flow in project A and load BIG QUERY table in project B.

Project A service account (as highlighted by A) as shown is added in project B and given big query admin and storage admin role.

Data flow runner service account is also added as you can see but looks like admin provided big query admin and storage admin role there and not compute network user. Not sure if I need to add data flow runner service account of A in B but I am getting below error while running the data flow job.

It mainly says "-compute@developer.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket.",\n "domain": "global",\n "reason": "forbidden"\n }\n ]\n }\n}\n>')}')}"

I have tried to provide project name as A AND B in the screen shot below for better understanding. Was not sure of any better way to explain.

Do I need to add .compute@developer.gserviceaccount.com of A in project B too?? Please advise.

enter image description here

enter image description here


Solution

  • When you run a dataflow job, you have workers. If you have a closer look to your project, worker are simple Compute Engine. And when you deploy compute engine, by default, the service account used is -compute@ . Thus it's this identity which try to access to your different component (GCS and BigQuery here).

    So, grant the required permission on the correct resources (don't do this at project level, it's better to do this at bucket or dataset level. If you don't know how to do, let me know).

    The other solution is to specify a custom service account at runtime in your Dataflow job to create workers, not with the Compute Engine default service account, but with the provided one. You can do this with the gcloud CLI for example


    About Dataflow permission, you can find more details here