Search code examples
azure-data-factorywildcardazure-synapsedynamic-content

Synapse pipeline - extract year and country from a filename in a wildcard path


I have files named File 2024 US adj.csv. I have a wild card path in a copy activity which ingests all files in an ADLS location and passes the year as a wild card. I'm trying to create two new columns in the files for year and country but don't know how to write the dynamic content to be able to extract these values from the file name. Any help is greatly appreciated.


Solution

    • You can add filename as a column in the additional column. But it is not possible to split the column data into year and country directly.

    • To do this, you can read all files in an ADLS using Get metadata activity and filter the files that start with "2024" using filter activity. Then loop through each file and extract the year and country from the file name using the substring function in for-each activity. Below is the detailed explanation of pipeline.
    1. GetMetadata1: This activity gets metadata for all child items in the ADLS location and returns the childItems field.

    2. Filter1:

    Expression:

    Items: @activity('Get Metadata1').output.childItems

    Condition: @startswith(item().name,'2024')

    This activity filters the child items to include only files that start with "2024".

    1. ForEach1:

    Expression: Items:@activity('Filter 1').output.value

    This activity loops through each file and copies the data to a sink dataset.

    3.1. copy activity: Take the copy activity inside the for-each activity. In source settings, add additional columns.

    Additional columns expression:

    year: @substring(item().name,0,4) country: @substring(item().name,5,2)

    The year and country columns are added as additional columns in the source dataset.