Search code examples
azureazure-devopsazure-data-factoryazure-resource-managercicd

CICD of data factory


At one of our customer sites, we use Azure Data Factory (ADF) to support multiple projects, organizing the pipeline and dataset sections into folders by project. However, we've encountered an issue with the deployment process that affects project independence in shared environments.

Here’s the problem scenario:

Imagine two developers, each working on separate projects, Project A and Project B.

  1. Developer 1 makes updates to Project A and pushes their code to the main branch in the DEV environment.
  2. Developer 2 simultaneously works on Project B and also pushes their updates to the main branch in DEV.

When Developer 2 is ready to promote Project B to the UAT environment, the current setup inadvertently deploys both Project A and Project B changes to UAT. This setup is problematic because the UAT environment unintentionally receives updates from Project A, which should remain isolated.

Our current deployment process uses ARM templates, specifically with an Azure Resource Group deployment task, but this setup lacks granular control to separate project-specific changes effectively.

- task: AzureResourceGroupDeployment@2
  displayName: 'Azure Deployment:Create Or Update Resource Group Data Factory'
  inputs:
    azureSubscription: 'AZURE-BIC-TST-WEU'
    resourceGroupName: rgweucgitdwhreports
    location: 'west europe'
    csmFile: '$(drop)/dfweucgiddwhreports/ARMTemplateForFactory.json'
    csmParametersFile: '$(drop)/dfweucgiddwhreports/ARMTemplateParametersForFactory_WEU_TST.json'
  timeoutInMinutes: 60

How I envison it to make (not sure if possible tho) it that before running the release pipeline user should give the value to pipeline variable ( below image for visualization) enter image description here

if user will give the value `ProjectB', it should only look at the Folder in adf where ProjectB is ( below image)

enter image description here

P.S I understand that this cant be achieved for so many things like (linked services, IR, so) but if I could manage to seperate Pipelines and Datasets this will already be a big step.

But Im also looking forward to hear more optipons who this can be done!


Solution

  • According to Continuous integration and delivery in Azure Data Factory,

    By design, Data Factory doesn't allow cherry-picking of commits or selective publishing of resources. Publishes will include all changes made in the data factory.

    Besides, when you use AzureResourceGroupDeployment@2 task to deploy your ADF, you are deploying via an ARM template. This ARM template is generated from ADF and contains all the resources in your ADF. ARM template does not support partial deployment, so the AzureResourceGroupDeployment@2 task cannot deploy partial resources in the template. It's not "this setup lacks granular control to separate project-specific changes effectively", but ARM template works that way.

    In summary, you can't deploy partial resources in ADF from one env to another when using ARM template.