Search code examples
azure-data-factoryazure-synapse

Data factory/Synapse copy task fails with staging enabled when no files found


We have a pipeline that copies data from parquet files stored in our ADLS gen 2 data lake to a Synapse dedicated SQL pool. The pipeline is metadata driven and will loop through a list of source containers and process new files in each of them by filtering on the last modified start time. Staging is enabled because we add a number of columns for logging purposes before the data is inserted into the dedicated pool.

Configuration of copy task

The pipeline runs great, except when no new files are found. The staging step of the copy activity is still executed, but the step from staging to Synapse fails because the temporary staging location cannot be found as no files have been placed there:

Second step fails

I've been searching for a way to skip the copy task altogether if no files are found, but tricky part is that the copy task searches recursively through the raw/data folder for any new files (final struncture is raw/data/yyyy/mm/dd/filename.parquet). The get metadata activity which would usually be the likely solution is not able to do this so I have no way of checking first if new files exist before the copy activity is executed. I have seen posts where you can use dynamic variables to create a recursive list for the get metadata activity but that creates too much overhead and will not be a workable solution. Does anyone here have any idea as to how I can prevent the copy activity from failing in the above scenario, or provide an easy way to skip it?


Solution

  • As your file structure is same for every parquet file, you can use Get Meta data activity to list out all the files and add filter condition to get the all the latest files.

    These are my sample files inside the folders.

    enter image description here

    I have used a dataset parameter for wildcard placeholder.

    enter image description here

    You can see I got only one file output as ChildItems.

    enter image description here

    If the above array length is 0 then it means, there are no latest files. So check whether the length of the ChildItems array is equals to Zero or not.

    Use if condition and below expression.

    @not(equals(length(activity('Get Metadata1').output.childItems),0))
    

    In True activities use your copy activity by which you can execute copy activity only if there are latest files in the storage.

    enter image description here