I have a SAP database on premise and I want to push daily some of this data (some tables from SAP database) to Azure. The data will be uploaded using a merge strategy (update if a change is detected, delete if a record is missing, otherwise insert). I am thinking to do this using the following setup:
- Install on premise integration Services (SSIS). The client already has license for this.
- Install SAP connector on premise.
- Pull data from SAP into text files (on premise, into multiple CSV files) (create ETL's for this).
- Push those files into a Blob Storage (via AzCopy), so I create a job or process somehow locally that will handle newly created files.
- Load data into Data Warehouse using PolyBase (https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-load-with-polybase) using an UP-SERT strategy.
This will work for sure but I am wondering whether it's possible to upload directly that data using an ETL (using an SSIS package), basically skipping the steps:
My concerns are in terms of: - Performance - Costs
What do you think is the best way?
Thank you!
You can use SQL Server Integration Services (SSIS) to load data to Azure SQL Data Warehouse via the Azure SQL DW Upload Task which is part of the Azure Feature Pack for Integration Services. This task takes files from a local file share and loads them to warehouse tables using Polybase in the background. Using this method you would not have to explicitly push files into Blob Storage or use AzCopy. This design looks like:
SAP > flat files > SQL DW Upload Task
You can also load data directly to a warehouse table using a Dataflow task but historically performance was very slow. The advantage I suppose of this design is that you don't have to extract the files to flat files. I haven't tried this recently as the recommended method is Polybase. This design looks like:
SAP > Azure SQL Data Warehouse (using Data Flow task)
An alternative method to consider is Azure Data Factory which now has SAP connectivity (for certain products). You would need a gateway to facilitate the copy from on-premises to Azure. ADF is potentially more scalable but there is a bit of a learning curve if you have not used it before.