Search code examples
apache-sparkpysparkapache-spark-sqlazure-databricksdelta-lake

How to Retain DataType After Pyspark Merge with Databricks


I have a Databricks Dataframe with the following Schema:

enter image description here

You will notice the Data Type for the CreatedOn field is a DateType()

However, after doing a merge and loading the data with the variable lakeDataDrop

lakeDataDrop = spark.read.format("delta").load(saveloc)

the Data Type changes to TimestampType()

enter image description here

Can someone explain why the Data Type changes to TimestampType()?

I would like the Data Type to remain as a DateType()


Solution

  • It looks like in your dataset that you're using for merge the corresponding column has the timestamp type, so if you have schema evolution enabled, then the type is promoted from date to timestamp. Solution would be to cast the column in the "updates" dataframe that you're using in merge to date before merge.