I Have a Dataframe stored in the format of delta into Adls, now when im trying to append new updated rows to that delta lake it should, Is there any way where i can delete the old existing record in delta and add the new updated Record.
There is a unique Column for the schema of DataFrame stored in Delta. by which we can check whether the record is updated or new.
This is a task for Merge command - you define condition for merge (your unique column) and then actions. In SQL it could look as following (column
is your unique column, and updates
could be your dataframe registered as temporary view):
MERGE INTO destination
USING updates
ON destination.column = updates.column
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED
THEN INSERT *
In Python it could look as following:
from delta.tables import *
deltaTable = DeltaTable.forPath(spark, "/data/destination/")
deltaTable.alias("dest").merge(
updatesDF.alias("updates"),
"dest.column = updates.column") \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()