Search code examples
azureazure-data-factorycloud

How to handle csv file with same column name in Azure Data Factory?


I have created a dataset for a CSV file store in ADLS. It has got two different columns with same header name [Parent_ID]. Beacuse of it I am getting error while mapping it to the sink.

Also the header start from 2nd row, and the dataset is not able to detect column names correctly. I tried using additional column option but it didn't work as the column names detected by dataset are not available.

I want the first one to be [Parent_ID] and second one to be [End_User_Parent_ID] in the Sink.

I am not allowed to use Dataflow in this case. How to handle it?


Solution

  • You can follow the suggestion by @SK from this ask.

    This is the input that I took in the source csv file.

    one,two,three,four,five
    id,Parent_ID,name,Parent_ID,DOB
    1,1a123,Rakesh,22,2000-16-02
    2,2b24,Laddu,23,2001-16-03
    3,89d2,MS,42,1981-07-07
    

    First give the source and sink datasets to the copy activity.

    • Now, Go to source dataset and check First Row as Header checkbox.

    • As your header is in the 2nd row, skip the first row of the source data. So, give the Skip line count as 1 in the copy activity source.

      enter image description here

    • Now, go to copy activity mapping and import the schemas. Here, by default, it will give numbered column names to the same named columns in the source. Change the required column to your name End_User_Parent_ID in the sink mapping.


      enter image description here

    • But the during the copy activity execution, it won't identify the column names Parent_ID1 and Parent_ID3 in the source mapping and will give the error like invalid mapping, column not found. To avoid that, go to source dataset and uncheck the First Row as header.


      enter image description here

    • Also, as the sink header is already set, now there is no need of source header. As your header is in 2nd row, update the Skip line count in the copy activity source to 2.
      enter image description here

    • Now, when there is no header selected from the source, it will identify the column names as 1,2,3,.. in the mapping. So, edit the mapping of all source columns to 1,2,3.. like below.


      enter image description here

    Now, debug the pipeline and it will give the required output file.

    enter image description here