Search code examples
azureparametersazure-sql-databaseazure-data-factory

Mapping a particular value to column in parametrised copy activity in Azure Data Factory


I am using Azure Data Factory to migrate data from source systems onto Azure SQL Database.

  1. Different source systems - > 2. Azure Database (Data Warehouse) - > 3. Azure Database (Data Mart)

I'll refer to each layers by their number.

Between layers 1 & 2, a Run_id is generated and is a new column added to each table loaded via a Copy activity in Azure Data Factory. The new column is simply added to the end of table to match against each run of the pipelines.

At this current stage, between layer 2 & 3 is a simple one-to-one mapping, however, I also generate a new run_id for each run in layer 3, and so I do not want to keep the data in the Run_id column from layer 2, I want to exclude mapping that column and instead map the newly generated Run_id in layer 3 to each of the tables in layer 3, similar to how I've done in layer 2.

How I've populated Run_id in layer 2 via additional column in Source tab

The issue stems from the fact that the column names Run_id are identical and ADF automatically assumes I want to carry over that existing data in layer 2 to layer 3.

I have parametrised my data migration to look for a list of tables to import in a config table which ADF cycles through via a ForEach activity and hence I cannot simply edit the mappings on the fly as my single copy activity cycles through 50+ tables:

Ingestion pipeline

Inside my ForEach activity, how I've parametrised my copy activity

I have tried to add a specific mapping like in the image attached:

New mapping for specific Run_id value

However when I do so, the actual data I want copied over doesn't copy and only the correct Run_id copies, as you can see in the image attached:

Result of the above mapping

I have thought out swapping my copy activity for a parametrised stored procedure instead which could give me more flexibility

I have also thought about changing the run_id column name for layer 2 so there's a difference in column names but I am keen on preserving column name Run_id in both layer 2 and layer 3.

Essentially, between layer 2 & 3, I need to copy over the data in every column EXCEPT the Run_id column where I need to assign a different value to the value which is in layer 2. Suppose layer 2:

Data Run_id
B 100
A 100

What I need in Layer 3:

Data Run_id
B [NEW Run_id]
A [NEW Run_id]

Thanks and please let me know if you need any clarification.


Solution

  • In mapping settings of copy activity, you need to select either auto-mapping of all columns or manual mapping. There is no option to manually map a single column and auto map for all other columns. Since you mapped the run-id column with input and there is no input for other columns, all other columns don't have data.

    If schemas of all tables are same, you can give map all columns manually by importing schemas. If tables have different structure, you can copy the data from layer2 to layer3 as it is and then update the run-id with script activity. I tried this with sample data. Below is the approach.

    • Three lookup activities are taken. One for getting the list of tables for copying from layer2 to layer3. Second one for getting layer 3 run-id and third lookup activity is for layer-2 run-id.

    enter image description here

    • For each activity iterates the list of items from lookup activity (lookup_table_list) and tables are copied from layer2 to layer3 without changing the run-id to new run-id using copy activity.
    • Script activity is taken and script is written as,
    update @{item().sch_name}.@{item().tab_name}
    set
    Run_id= '@{activity('Look up layer_3 Run_id').output.firstRow.Run_id}'
    where Run_id= '@{activity('Look up layer_2 Run_id').output.firstRow.Run_id}'
    

    This script will assign the Run_Id column with the value from Lookup layer_3 Run_id activity.

    This repro is done by considering run-id is same for all tables in layer-2. If it is table specific, change the logic by adding the lookup activity for getting layer2 run-id inside for-each activity.