We are trying to delete data from a delta lake using a AWS Glue Job. Please suggest why the merge condition is not working for delete.
This works fine if my delete_condition is like
changes.flag = True
However it is not performing any deletes if the delete_condition is like
source.date_field > date_sub(current_date(),7)
Also, it works fine if I use direct deletes in place of Merge
delta_source.delete(date_field > date_sub(current_date(),7))
and the merge part of code is:
delta_source = DeltaTable.forPath(spark, f"{delta_path}")
delta_merger_0 = delta_source.alias("source").merge(
latest_change_for_each_key.alias("changes"), insert_command
)
delta_merger_1 = delta_merger_0.whenMatchedDelete(
condition=delete_condition
)
delta_merger_0 = delta_merger_1
delta_merger_0.whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
delta_source is the delta lake source latest_change_for_each_key is the incremental records data frame
I found the solution to it.
The merge clause will look for matching condition (say source.primary_key = target.primary_key) and only if the match condition is satisfied, it will run the delete action.
Since in my case there can be an instance where we do not have matching keys between source and target so rather than using merge-delete, we shall use direct deletes on the Delta source.