Search code examples
google-cloud-dataflow

How to fix "java.lang.RuntimeException: Failed to create job" in a Dataflow template job that writes to BigQuery?


I'm trying to use the JDBC to BigQuery Dataflow template to copy data from a Postgres database to BigQuery. But my Dataflow job is failing and I'm running into this error below:

java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_LOAD_jdbctobigquerydataflow0releaser1025092115d7a229e9_214eff91b59f4b8d863809d3865504fa_11cbacad09f05e44363d2dd2963e9fd1_00001_00000, reached max retries: 3, last failed job: null.

at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob ( org/apache.beam.sdk.io.gcp.bigquery/BigQueryHelpers.java:200 )
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone ( org/apache.beam.sdk.io.gcp.bigquery/BigQueryHelpers.java:153 )
at org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle ( org/apache.beam.sdk.io.gcp.bigquery/WriteTables.java:378 )

I saw other stack overflow posts and have already tried the following:

  1. Ensure that Dataflow worker service account is granted BigQuery Admin and BigQuery User role for the dataset I'm writing to.

  2. Database is not too large - I'm only copying <10 rows since I'm just testing it out.

  3. Ensure that schema in Postgres DB is the same with BigQuery table's schema.

The solutions above still didn't work, anything else I can try? Thanks!


Solution

  • Is the project where the dataset is located different to the one where Dataflow is running? If they are different, you will need to assign the BigQuery User role in the Dataflow project. The BigQuery job is initiated in that project.

    (the BigQuery Admin role for the destination dataset would remain as is)