In my Azure data factory I need to copy data from an SFTP source that has structured the data into date based directories with the following hierarchy year -> month -> date -> file
I have created a linked service and a binary dataset where the dataset "filesystem" points to the host and "Directory" points to the folder that contains the year directories. Ex: host/exampledir/yeardir/
with yeardir containing the year directories.
When I manually write into the dataset that I want the folder "2015" it will copy the entirety of the 2015 folder, however if I put a parameter for the directory and then input the same folder path from a copy activity it creates a file called "2015" inside of my blob storage that contains no data.
My current workaround is to make a nested sequence of get metadata for loops that drill into each folder and subfolder and copy the individual file ends. However the desired result is to instead have the single binary dataset copy each folder without the need for get metadata.
Is this possible within the scope of the data factory?
edit:
properties used in copy activity
To add further context I have tried manually writing the filepath into the copy activity as shown in the photo, I have also attempted to use variables, dynamic content for the parameter (using base filepath and concat) and also putting the base filepath into the dataset alongside @dataset().filePath. None of these solutions have worked for me so far and either copy nothing or create the empty file I mentioned earlier.
The sink is a binary dataset linked to Azure Data Lake Storage Gen2.
Update:
The accepted answer is the solution. My problem was that the source dataset when retrieved would have a newline at the end when passed as a parameter. I used concat to clean this up and this has worked since then.
Since giving exampledir/yeardir/2015
worked perfectly for you and you want to copy all the folders present in exampledir/yeardir
, you can follow the below procedure:
get metadata
activity to get the child items of the folder exampledir/yeardir/
(In my demonstration, I have taken path as 'maindir/yeardir'.).@activity('Get Metadata1').output.childItems
maindir/yeardir/@{item().name}
outputDir/@{item().name}
Since giving path manually as exampledir/yeardir/2015
worked, we have got the list of year folders using get metadata activity. We looped through each of this and copy each folder with source path as exampledir/yeardir/<current_iteration_year_folder>
.
Based on how I have given my sink path, the data will be copied with contents. The following is a reference image.