I am working on creating a flexible and parameterized framework/pipeline that Archives/Purges the data using Azure Synapse. I have a few Delta tables in the current setup that cannot be handled like normal blobs or tables. I read the Microsoft documentation on setting the Retention Period in Databricks and can even 'Vacuum' the unused data files. Is there an alternate way in Azure Synapse to achieve the same thing?
I have a framework blueprint in place but that was created keeping the non-delta data files in mind.
Thank you @Tyler Long for excellent documentaion on Data lake in Azure Synapse Analytics.
As you mentioned that you want the approach for Retention Period & 'Vacuum' for the unused data files in Azure synapse.
As you mentioned that don't want to use the databricks I have used the Azure synapse Spark notebook to perform the Retention Period & 'Vacuum'
The below statement will read the current version of the Delta table and write out to 1 single file.
(spark.read
.format("delta")
.load("/Delta_Demo/Employees/")
.repartition(1)
.write
.option("dataChange", "false")
.format("delta")
.mode("overwrite")
.save("/Delta_Demo/Employees/"))
VACUUM default.DELTA_Employees
VACUUM default.DELTA_Employees RETAIN 240 HOURS