Search code examples
sql-serverazuredatabase-designetl

How to backup and maintain a SQL Server table in Azure?


I have created a pipeline in Azure ADF which does ETL to produce a 15GB clean file "Final_Data_2023.csv" in an Azure storage container and this clean file will get copied into SQL Server table dbo.Final_Table.

This process happens every month and next month we will prepare the Final_Data_2023.csv clean file again. I need to truncate dbo.Final_Table and again push new data into it. But I am concerned that my new data has completely wrong values and for a quick fix I need my old Final_Data_2023.csv to be in table dbo.Final_Table. Since I am truncating all data from the table, it won't be possible to get it back.

How should I design my architecture so that I can quickly access or maintain the previous month table and revert it if something goes wrong?

It doesn't have to be a small workaround.


Solution

  • Azure Storage container is mainly used for storing massive data. So, you can design a pipeline to store all backup data in container and copy only the latest file to SQL server table.

    Approach:

    • You can store the filename along with date while copying to Azure storage container.
    • Copy the same file from container to database.

    Below are the detailed steps.

    • Store the file name with current date in a variable using set variable activity. @concat(substring(utcnow(),0,10),'_filename.csv') enter image description here

    • Add a copy activity to copy data from SQL server to storage container.

    • In copy activity, you can take the source dataset and in sink dataset, create a parameter for filename.

    enter image description here

    • Add the parameter @dataset().fileName in the file path as in below image. enter image description here

    • In Sink dataset, pass the variable value to the dataset parameter.

    enter image description here

    • Add another copy activity to copy from container to DB and keep the same source dataset that used in copy activity1 and pass the variable in dataset parameter.

    enter image description here

    • In sink dataset, give the dataset for SQL server.

    By this way, you can store the backup data in azure container and copy the new clean file to database. If needed to rollback to previous versions of file, we can copy those data from container.