I have created a dataset for a CSV file store in ADLS. It has got two different columns with same header name [Parent_ID]. Beacuse of it I am getting error while mapping it to the sink.
Also the header start from 2nd row, and the dataset is not able to detect column names correctly. I tried using additional column option but it didn't work as the column names detected by dataset are not available.
I want the first one to be [Parent_ID] and second one to be [End_User_Parent_ID] in the Sink.
I am not allowed to use Dataflow in this case. How to handle it?
You can follow the suggestion by @SK from this ask.
This is the input that I took in the source csv file.
one,two,three,four,five
id,Parent_ID,name,Parent_ID,DOB
1,1a123,Rakesh,22,2000-16-02
2,2b24,Laddu,23,2001-16-03
3,89d2,MS,42,1981-07-07
First give the source and sink datasets to the copy activity.
Now, Go to source dataset and check First Row as Header checkbox.
As your header is in the 2nd row, skip the first row of the source data. So, give the Skip line count as 1
in the copy activity source.
Now, go to copy activity mapping and import the schemas. Here, by default, it will give numbered column names to the same named columns in the source. Change the required column to your name End_User_Parent_ID
in the sink mapping.
But the during the copy activity execution, it won't identify the column names Parent_ID1
and Parent_ID3
in the source mapping and will give the error like invalid mapping, column not found
. To avoid that, go to source dataset and uncheck the First Row as header.
Also, as the sink header is already set, now there is no need of source header. As your header is in 2nd
row, update the Skip line count in the copy activity source to 2
.
Now, when there is no header selected from the source, it will identify the column names as 1,2,3,..
in the mapping. So, edit the mapping of all source columns to 1,2,3..
like below.
Now, debug the pipeline and it will give the required output file.