Search code examples
google-cloud-platformgoogle-cloud-storagegoogle-cloud-dataflowgoogle-cloud-spanner

Can I run dataflowjob between projects?


I want to export data from Cloud Spanner in project A to GCS in project B as AVRO. If my service-account in project B is given spanner.read access in project A, can I run a dataflow-job from project B with template: Cloud_Spanner_to_GCS_Avro and write to GCS in project B?

I've tried both in console and with following command:

gcloud dataflow jobs run my_job_name 
--gcs-location='gs://dataflow-emplates/latest/Cloud_Spanner_to_GCS_Avro' 
--region=my_region 
--parameters='instanceId=name_of_instance,databaseId=databaseid,outputDir=my_bucket_url 
--service-account-email=my_serviceaccount_email

I'm not sure how to specify projectId of the Spanner instance. With this command from project B it looks in project B:s Spanner and cannot find the instance and database.

I've tried to set: instanceId=projects/id_of_project_A/instances/ name_of_instance but it's not a valid input


Solution

  • So the answer seems to be that it's possible for some templates or if you write a custom one, but not the template I want to use, batch export from Spanner to GCS Avro files. And that it might be added in a future update to the template.