Search code examples
azure-data-factoryetl

Can Azure Data Factory pipelines dynamically set dataflow source and destination databases


I have 3 azure SQL databases that I want to sync data between. These 3 databases all have the same structure they are set up for 3 environments. (Dev, QA, Prod).

I want to have a single pipeline that accepts 3 parameters, source database, destination database, ClientId to filter data. The pipeline would then fire a dataflow that would filter the data by client id and then push the data performing an upsert in the destination database.

I have this working with predefined datasets and data sources but I want to know if it's even possible to do the dynamic data sources and datasets. I'm trying to avoid creating a pipeline for each instance of data migration example, (Dev -> QA), (QA -> Prod), (Dev <- QA), (QA <- Prod).

I currently have set up a pipeline that accepts a clientId as a parameter. The pipeline fires a data flow which does a filter on the client table in dev. The returned result is then passed to an alter row which does the conditional for upsert. The data eventually is pushed to the QA environment and updates or inserts the data in the new environment. As previously stated this is working but I have each data source predefined and I would like to do it dynamically. dataflow image


Solution

  • You can define a linked service with parameters for the server name and database name:

    linked service

    And then create a dataset connected to that linked service, and again use parameters for the server name and database name: dataset

    Use this dataset in the dataflow: data flow

    Then when you use the dataflow in a pipeline, it will request the parameters for server and database: pipeline