foreachcopyazure-data-factoryazure-data-lake-gen2

Copying CSV from adls to adls not working in adf pipeline


sink of my copy activity

source of my copy activity

I want to copy the CSV files present in adlsfolder called source to another adls folder called destination through adf pipeline copy activity.

1st activity = GetMetadata to get all the files from the source folder.

2nd activity = foreach to loop through the files

Inside 2nd activity= copy activity to copy from source to destination.

My source dataset is correctly pointing to the source folder and same with the output dataset.

I'm using the same filenames in destination as present in source adls.

Even though there's no error after running the pipeline, but the data present in source files is not present in the destination files.

The filenames are coming correctly in the destination as in the source, which is want I wanted .

But the data is incorrect.

For example let's say I have 5 files in source as follows:

  • Classone_00000.csv
  • Classone_00001.csv
  • Classtwo_00000.csv
  • Classtwo_00001.csv
  • Classtwo_00003.csv

I'm using partition and have mentioned 5 rows per file.

In the destination I will still have all the 5 files with the same filename , but the data is incorrect. In my case classone_00000.csv data would be present in all the 4 other files and if there were 5 rows in source file, there would be only 4 files in my destination files.

I'm not able to figure out why that's happening.

I have tried making the foreach sequential and not sequential, but nothing seems to be working.

I just want the data present in adls source folder to get copied exactly to the adls destination folder.


Solution

  • In your approach, the copy activity will give all the source files in every iteration and for every target file the last source file data will be overwritten into target file. This might be the reason for your incorrect data.

    To achieve your requirement, there is no need to use loop. You can do the same with a single copy activity in the pipeline.

    Give the path till the container in the source dataset and give your wildcard path in the copy activity source dataset.

    enter image description here

    In the target dataset, give the path till your target folder location. Don't give any filepath.

    enter image description here

    Give this to the copy activity sink.

    enter image description here

    Execute the pipeline and all files in the source location will be copied to the target location.

    enter image description here