I am trying to include schema change for new type and dropped column in Delta tables.As per documentation
As per documentation:
df3
.write
.format("delta")
.mode("overwrite")
.option("mergeSchema", "true")
.save(deltapath)
This way I loose old data. Is there anyway to keep olddata before overwriting with new schema apart from taking backup. I have posted same query in databricks forum,link for your reference: https://community.databricks.com/s/question/0D58Y00009WjZllSAF/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in-schema-evaluation
One way is to create a backup of file before overwriting it.But I want to know what is best way.
The command you shared won't delete the existing data and it should still be queryable.
For example:
Create a table with 5 rows
spark.range(5).write.format("delta").save("/tmp/table")
Overwrite with a new column & mergeSchema
option
spark.range(5)
.withColumn("data", lit("data"))
.write.format("delta")
.mode("overwrite")
.option("mergeSchema","true")
.save("/tmp/table")
Query original data
SELECT * FROM delta.`/tmp/table` VERSION AS OF 0
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
+---+
Query new/current state
SELECT * FROM delta.`/tmp/table` VERSION AS OF 1
+---+----+
| id|data|
+---+----+
| 0|data|
| 1|data|
| 2|data|
| 3|data|
| 4|data|
+---+----+