I am performing merge operation on databricks delta table as below -
spark.sql(""" MERGE INTO <delta table name> deltatbl USING <temp view> source
ON deltatbl.col1 = source.col1
AND deltatbl.col2 = source.col2
WHEN NOT MATCHED THEN INSERT
(col1,col2) VALUES(source.Col1,source.Col2) """)
Above query is inserting duplicates records despite matching unique keys. How I can achieve an output where only non matching records are inserted. all columns are part of key.
From what I can see, one problem could be that what you are inserting has duplicates in it. WHEN NOT MATCHED THEN INSERT does not deduplicate records before inserting.
In order to not load duplicates into the table you must deduplicate before running the merge.
You can either do this through the python API:
spark.table('{your_view_name_here}').dropDuplicates()
Or through the SQL API:
SELECT DISTINCT(*)
FROM {your_view_name_here}
without any more information that is the best guess I can provide.