Search code examples
azureazure-data-factory

ADF copy actively add additionl column if not available in source


I have 10 files which have different columns count, but destination is same for all files. As we knoe copy active always looking for same number of columns in source as well as destination, My plan is to add the columns dynamically in copy activity and pass null to the destination, Do we have any way to dynamically create the comumn if not comming from source file.

I tried multiple things but not working every time I am getting unexpected type '|List`1' error


Solution

  • Copy activity cannot add columns dynamically based on the target data. Instead, you can use dataflow for this.

    For this, create 3 datasets.

    • One is for the source files list. In this give the path only till the source folder.

    • Second one is also for source but here, create a dataset parameter filename and use that at the file name of the dataset as @dataset().filename.

      enter image description here

    • The last dataset is for the target file. Give target file path in this dataset.

    These are my sample source files:

    Sample1.csv

    col4,col3,col2,col5,col1
    mycol4,1,22,mycol5,B
    mycol4,2,23,mycol5,B
    

    Sample2.csv

    col8,col3,col2,col9,col4
    mycol8,1,22,mycol9,mycol4
    mycol8,2,23,mycol9,mycol4
    

    Sample3.csv

    col1,col3,col2,col6,col7,col8
    Rakesh,1,22,2000-16-02,B,C
    Laddu,2,23,2001-16-03,B,C
    MS,3,42,1981-07-07,B,C
    

    In the target csv file, I took only headers. Your target file may contain the data.

    col1,col2,col3,col4,col5,col6,col7,col8,col9
    

    In the dataflow, give the target dataset as one source and source dataset with parameter as another source.

    In the target dataset source, import the projection and make sure all data types match the source data. Here, I set all data types as string.

    enter image description here

    In the source, don't import any projection and clear the schema in the dataset as well.

    enter image description here

    Take a union transformation after TargetAsSource and union it with Source by Name.

    enter image description here

    Next add sink. The sink dataset should be the target dataset. Go to sink settings -> set the FIle Name option as Output to single file and give the actual file name in the Output to single file option.

    enter image description here

    Now, create a pipeline and take a Get meta data activity with the first source dataset that created earlier. In the fields of the Get meta data activity, give ChildItems. This will give all the file names list from the source folder.

    Take a For-Each activity after this and give child items array @activity('Get Metadata1').output.childItems from this to the for-each expression. Make sure you check Sequential checkbox in the For-Each activity.

    Inside For-Each, take Data flow activity and give the above dataflow to it. Give @item().name to the source dataset parameter like below.

    enter image description here

    Now, debug the pipeline and all of your source files data will be combined and adds required columns with null values after the pipeline run.

    enter image description here