I have a partitioned table where one of the column is of type DateTime and the table is partitioned on same column. According to spark-bigquery documentation, the corresponding Spark SQL type is of String type. https://github.com/GoogleCloudDataproc/spark-bigquery-connector
I tried doing the same but I am getting datatype mismatch issue.
Code Snippet:
ZonedDateTime nowPST = ZonedDateTime.ofInstant(Instant.now(), TimeZone.getTimeZone("PST").toZoneId());
df = df.withColumn("createdDate", lit(nowPST.toLocalDateTime().toString()));
Error:
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Failed to load to <PROJECT_ID>:<DATASET_NAME>.<TABLE_NAME> in job JobId{project=<PROJECT_ID>, job=<JOB_ID>, location=US}. BigQuery error was Provided Schema does not match Table <PROJECT_ID>:<DATASET_NAME>.<TABLE_NAME>. Field createdDate has changed type from DATETIME to STRING
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.loadDataToBigQuery(BigQueryWriteHelper.scala:156)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.scala:89)
... 36 more
As Spark has no support for DateTime
, the BigQuery connector does not support writing DateTime
- there is no equivalent Spark data type that can be used. We are exploring ways to augment the DataFrame's metadata in order to support the types which are supported by BigQuery and not by Spark (DateTime
, Time
, Geography
).
At the moment please have this field as String, and have the conversion on the BigQuery side.