Search code examples
google-cloud-platformgoogle-cloud-dataflowapache-beambatch-processing

Unable to execute Dataflow Pipeline : "Failed to create job with prefix beam_bq_job_LOAD_textiotobigquerydataflow"


I'm trying to run a dataflow batch job using template "Text file on Cloud Storage to BigQuery". First three steps are working but in the last stage it is getting failed giving error: Error message from worker: java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000, reached max retries: 3, last failed job: { "configuration" : { "jobType" : "LOAD", "labels" : { "beam_job_id" : "2022-11-10_02_06_07-15255037958352274885" }, "load" : { "createDisposition" : "CREATE_IF_NEEDED", "destinationTable" : { "datasetId" : "minerals_test_dataset", "projectId" : "jio-big-data-poc", "tableId" : "mytable01" }, "ignoreUnknownValues" : false, "sourceFormat" : "NEWLINE_DELIMITED_JSON", "useAvroLogicalTypes" : false, "writeDisposition" : "WRITE_APPEND" } }, "etag" : "LHqft9L/H4XBWTNZ7BSRXA==", "id" : "jio-big-data-poc:asia-south1.beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "jobReference" : { "jobId" : "beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "location" : "asia-south1", "projectId" : "jio-big-data-poc" }, "kind" : "bigquery#job", "selfLink" : "https://bigquery.googleapis.com/bigquery/v2/projects/jio-big-data-poc/jobs/beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2?location=asia-south1", "statistics" : { "creationTime" : "1668074949767", "endTime" : "1668074949869", "startTime" : "1668074949869" }, "status" : { "errorResult" : { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ], "state" : "DONE" }, "user_email" : "49449455496-compute@developer.gserviceaccount.com", "principal_subject" : "serviceAccount:49449455496-compute@developer.gserviceaccount.com" }. org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:200) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:153) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:378)

enter image description here

I tried running same job with other datasets csv files, also the javascript udf and json schema are according to the documentation, but the job is failing at the same stage. So, what can be the possible solution to this error?


Solution

  • The Json schema you given doesn't matches the BigQuery schema of your table :

    "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ]
    

    There is a field called field: marks that seems to not exists in the BigQuery table.

    If you update your BigQuery schema to match perfectly with the fields of your input Json line and element, that will solve the issue.