I am trying to get a default GCP Dataflow pipeline template to run (Cloud Spanner to GCS), but all my attempts to start the Pipeline/Job fail with a message indicating that a result file is missing. I've not modified any default options of the template.
Failed to read the result file :
gs://dataflow-staging-us-central1-11111111111/staging/template_launches/2023-06-19_10_55_21-884192550311219509/operation_result with error message:
(8bac83beae18b544): Unable to open template file: gs://dataflow-staging-us-central1-11111111111/staging/template_launches/2023-06-19_10_55_21-884192550311219509/operation_result..
Interestingly, I've managed to get the Pipeline working once, a few days ago. Stopped the pipeline, then attempted it again today, and all variations failed.
At this point, I am clueless why that one attempt a few days ago worked, and all of my current efforts fail
Any idea why the jobs keep failing?
Let me preface: I have tried everything using the Google Cloud Console (web UI) - I wasn't able to get a pipeline running. However, creating a job with the gcloud
CLI worked.
gcloud auth application-default login
in your local shellus-central1
in the template file location and everywhere else to match your region. More params heregcloud dataflow flex-template run spanner-to-bigquery \
--template-file-gcs-location=gs://dataflow-templates-us-central1/2023-06-06-00_RC00/flex/Spanner_Change_Streams_to_BigQuery \
--region us-central1 \
--project=<your_gcs_project> \
--service-account-email=dataflow-spanner-to-bq@<your_gcs_project>.iam.gserviceaccount.com \
--parameters \
spannerInstanceId=development,\
spannerDatabase=main,\
spannerMetadataInstanceId=development,\
spannerMetadataDatabase=main-meta,\
spannerChangeStreamName=AllStream,\
bigQueryDataset=dev_spanner_all,\
numWorkers=1,\
enableStreamingEngine=true,\
tempLocation=gs://create_a_gs_bucket/tmp,\
stagingLocation=gs://create_a_gs_bucket/staging,\
workerRegion=us-central1
This isn't a solution, but merely a poor work-around. It seems that this is a GCP issue that should be addressed (or at least some type of resolvable error should be displayed)
Not per se a solution, but creating a new project in GCP, and replicating a simple Spanner, GCS, and Dataflow setup worked in a new environment (GCP Project) without any issues.
It seems as if there are meta data or fragments that prevent the dataflow pipelines in my other project from launching correctly.
EDIT
Out of curiosity, I've attempted to disable all related APIs in the GCP project in which the Dataflow launch keeps failing. Even after disabling and re-enabling them, it still keeps failing on launch.