I have 10 files which have different columns count, but destination is same for all files. As we knoe copy active always looking for same number of columns in source as well as destination, My plan is to add the columns dynamically in copy activity and pass null to the destination, Do we have any way to dynamically create the comumn if not comming from source file.
I tried multiple things but not working every time I am getting unexpected type '|List`1' error
Copy activity cannot add columns dynamically based on the target data. Instead, you can use dataflow for this.
For this, create 3 datasets.
One is for the source files list. In this give the path only till the source folder.
Second one is also for source but here, create a dataset parameter filename
and use that at the file name of the dataset as @dataset().filename
.
The last dataset is for the target file. Give target file path in this dataset.
These are my sample source files:
Sample1.csv
col4,col3,col2,col5,col1
mycol4,1,22,mycol5,B
mycol4,2,23,mycol5,B
Sample2.csv
col8,col3,col2,col9,col4
mycol8,1,22,mycol9,mycol4
mycol8,2,23,mycol9,mycol4
Sample3.csv
col1,col3,col2,col6,col7,col8
Rakesh,1,22,2000-16-02,B,C
Laddu,2,23,2001-16-03,B,C
MS,3,42,1981-07-07,B,C
In the target csv file, I took only headers. Your target file may contain the data.
col1,col2,col3,col4,col5,col6,col7,col8,col9
In the dataflow, give the target dataset as one source and source dataset with parameter as another source.
In the target dataset source, import the projection and make sure all data types match the source data. Here, I set all data types as string.
In the source, don't import any projection and clear the schema in the dataset as well.
Take a union transformation after TargetAsSource and union it with Source by Name.
Next add sink. The sink dataset should be the target dataset. Go to sink settings -> set the FIle Name option as Output to single file and give the actual file name in the Output to single file option.
Now, create a pipeline and take a Get meta data activity with the first source dataset that created earlier. In the fields of the Get meta data activity, give ChildItems. This will give all the file names list from the source folder.
Take a For-Each activity after this and give child items array @activity('Get Metadata1').output.childItems
from this to the for-each expression. Make sure you check Sequential checkbox in the For-Each activity.
Inside For-Each, take Data flow activity and give the above dataflow to it. Give @item().name
to the source dataset parameter like below.
Now, debug the pipeline and all of your source files data will be combined and adds required columns with null values after the pipeline run.