Search code examples
etlazure-data-factory

How can I create two column from one column/column header in Azure Data Factory Pipeline?


I have daily autogenerating files in an onprem file. The file comes in with a "Net Sales 10/28/22" column header but I would like to create another column so there is two columns Net Sales and Net Sales Date. Also, when the new files for the next day is uploaded, I need that one column to be Net Sales and another column to be Net Sales Date. Here is what I am going for OG File and Goal Any help is greatly appreciated.


Solution

  • In Azure Data Factory, I created a dataflow. To get the date from the column header, I used two derived columns.

    • I used the source dataset as a csv file. enter image description here

    • Data preview of source: enter image description here

    • Then I created one derived column using derived column activity to the source file named "columnames". In this column I am taking collection of all column's names in array. Using columnNames() Expression. enter image description here

    • Data preview of derived column1: enter image description here

    • Now, I have created another derived column activity to create two derived columns,

    1. Net sales: In this column I am fetching values from Net sales 10/28/2022 column of original dataset with toInteger(byPosition(3)) expression this returns all values of column at position 3 in dataset
    2. Net sales Date: In this column I am fetching date from Net sales 10/28/2022 column header of original dataset with at(columnnames,3) this returns column name at index number 3 in columnnames array and split(at(columnnames, 3),' ')[4] expression splits the string with delimiter. enter image description here
    • Data preview of derived column2 desired columns: enter image description here

    • Now, select your desired columns from derived column 2 activity output. enter image description here

    • Data preview of select activity and desired output: enter image description here

    when the new files for the next day is uploaded, I need that one column to be Net Sales and another column to be Net Sales Date.

    When I uploaded another file with new date Got output in similar format. Sample file preview: enter image description here Output for above file: enter image description here