Search code examples
azureazure-data-factory

Azure Data Factory: How to Prevent Data Flow Execution When No File is Found by Copy Activity in Azure Data Factory?


I have an Azure Data Factory pipeline structured as follows:

  1. Copy Activity: Searches for a file on an SFTP server using a specific wildcard and timestamp, and moves it to Blob Storage.
  2. Data Flow: On success of copy activity, Retrieves the file from Blob Storage, compares its contents with an SQL table, and copies non-matching rows to the SQL table.

The problem is that even when there is no matching file on the SFTP server, the pipeline still triggers the Data Flow.

How can I configure the Copy Activity to prevent the Data Flow from executing when no file is found and moved to Blob Storage?


Solution

  • This could be an "alternative" solution: check the file exists before triggering the Dataflow with a Get Metadata activity?

    General flow

    The flow will look like this:

    enter image description here

    Dataset

    Having a dataset with some parameter (ADLS Gen2 Parquet in my case, but it could be anything) allows you to dynamically populate your file name.

    Parameters

    enter image description here

    Connection

    enter image description here

    This way you can dynamically populate your file name.

    Metadata activity

    The configuration of the get metadata activity will be pretty simple:

    enter image description here

    You can add variables at will to dinamically populate the filename/path

    If condition

    The Dataflow will be triggered based on the file exsitance with this expression @activity('Get Metadata1').output.exists

    enter image description here

    Edit 1

    Following request: dinamically created file name

    Let's say we want to dinamically set the filename property of the dataset. Click on add "add dynamic content"

    enter image description here

    Using a variable

    Lets say we want to dynamically set the output filename of the sftp retrieved file.

    1. Add a Variable

    Add a variable to the pipeline

    enter image description here

    enter image description here

    2. Add a set Variable activity

    enter image description here

    write your dynamic expression

    3. Consume the variable

    Consume the variable value in the dataset parameter

    enter image description here

    The updated flow

    enter image description here