I'm trying to run a dataflow batch job using template "Text file on Cloud Storage to BigQuery". First three steps are working but in the last stage it is getting failed giving error:
Error message from worker: java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000, reached max retries: 3, last failed job: { "configuration" : { "jobType" : "LOAD", "labels" : { "beam_job_id" : "2022-11-10_02_06_07-15255037958352274885" }, "load" : { "createDisposition" : "CREATE_IF_NEEDED", "destinationTable" : { "datasetId" : "minerals_test_dataset", "projectId" : "jio-big-data-poc", "tableId" : "mytable01" }, "ignoreUnknownValues" : false, "sourceFormat" : "NEWLINE_DELIMITED_JSON", "useAvroLogicalTypes" : false, "writeDisposition" : "WRITE_APPEND" } }, "etag" : "LHqft9L/H4XBWTNZ7BSRXA==", "id" : "jio-big-data-poc:asia-south1.beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "jobReference" : { "jobId" : "beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "location" : "asia-south1", "projectId" : "jio-big-data-poc" }, "kind" : "bigquery#job", "selfLink" : "https://bigquery.googleapis.com/bigquery/v2/projects/jio-big-data-poc/jobs/beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2?location=asia-south1", "statistics" : { "creationTime" : "1668074949767", "endTime" : "1668074949869", "startTime" : "1668074949869" }, "status" : { "errorResult" : { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ], "state" : "DONE" }, "user_email" : "49449455496-compute@developer.gserviceaccount.com", "principal_subject" : "serviceAccount:49449455496-compute@developer.gserviceaccount.com" }. org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:200) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:153) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:378)
I tried running same job with other datasets csv files, also the javascript udf and json schema are according to the documentation, but the job is failing at the same stage. So, what can be the possible solution to this error?
The Json
schema you given doesn't matches the BigQuery
schema of your table :
"Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ]
There is a field called field: marks
that seems to not exists in the BigQuery
table.
If you update your BigQuery
schema to match perfectly with the fields of your input Json
line and element, that will solve the issue.