I am using Azure Data Factory to migrate data from source systems onto Azure SQL Database.
I'll refer to each layers by their number.
Between layers 1 & 2, a Run_id is generated and is a new column added to each table loaded via a Copy activity in Azure Data Factory. The new column is simply added to the end of table to match against each run of the pipelines.
At this current stage, between layer 2 & 3 is a simple one-to-one mapping, however, I also generate a new run_id for each run in layer 3, and so I do not want to keep the data in the Run_id column from layer 2, I want to exclude mapping that column and instead map the newly generated Run_id in layer 3 to each of the tables in layer 3, similar to how I've done in layer 2.
The issue stems from the fact that the column names Run_id are identical and ADF automatically assumes I want to carry over that existing data in layer 2 to layer 3.
I have parametrised my data migration to look for a list of tables to import in a config table which ADF cycles through via a ForEach activity and hence I cannot simply edit the mappings on the fly as my single copy activity cycles through 50+ tables:
I have tried to add a specific mapping like in the image attached:
However when I do so, the actual data I want copied over doesn't copy and only the correct Run_id copies, as you can see in the image attached:
I have thought out swapping my copy activity for a parametrised stored procedure instead which could give me more flexibility
I have also thought about changing the run_id column name for layer 2 so there's a difference in column names but I am keen on preserving column name Run_id in both layer 2 and layer 3.
Essentially, between layer 2 & 3, I need to copy over the data in every column EXCEPT the Run_id column where I need to assign a different value to the value which is in layer 2. Suppose layer 2:
Data | Run_id |
---|---|
B | 100 |
A | 100 |
What I need in Layer 3:
Data | Run_id |
---|---|
B | [NEW Run_id] |
A | [NEW Run_id] |
Thanks and please let me know if you need any clarification.
In mapping settings of copy activity, you need to select either auto-mapping of all columns or manual mapping. There is no option to manually map a single column and auto map for all other columns. Since you mapped the run-id column with input and there is no input for other columns, all other columns don't have data.
If schemas of all tables are same, you can give map all columns manually by importing schemas. If tables have different structure, you can copy the data from layer2 to layer3 as it is and then update the run-id with script activity. I tried this with sample data. Below is the approach.
update @{item().sch_name}.@{item().tab_name}
set
Run_id= '@{activity('Look up layer_3 Run_id').output.firstRow.Run_id}'
where Run_id= '@{activity('Look up layer_2 Run_id').output.firstRow.Run_id}'
This script will assign the Run_Id column with the value from Lookup layer_3 Run_id activity.
This repro is done by considering run-id is same for all tables in layer-2. If it is table specific, change the logic by adding the lookup activity for getting layer2 run-id inside for-each activity.