Search code examples
azure-data-factorywildcardazure-synapse

Synapse pipeline - Extract actual filename from wildcard file path


We are getting a single file dropped at random weekly intervals into ADLG2 BLOB storage. The file format will be YYYYMMDD_WLS.csv. I am using a simple synapse pipeline with some data flows and copy operations. I have no problem picking up and ingesting the file using *WLS.csv in the dataset file path but when I pull the metadata itemName, it returns *WLS.csv and not the actual filename it processed like 20231220_WLS.csv. I need the original filename to be used later to copy it to an archive and append a post-processing timestamp. I also tried leaving the filename blank and returning childItems, but that returns a mess of additional characters even when there's only one file.

How can I retrieve the actual filename to then store in a parameter or variable to be used downstream?

Thanks in advance!

I saw another post that extracted the name using a bunch of nested REPLACE functions, but it's somewhat cumbersome to read and the client won't like it.

childItems output :

"variableName": "OriginalFileName",
"value": "[{\"name\":\"20231220_WLS.csv\",\"type\":\"File\"}]"

Solution

  • You can use the following method to get the file name using a wildcard path with the Get Metadata activity:

    If you have a single file in your directory, you can follow the procedure below:

    In the dataset, create a parameter named fileName with the expression *WLS.csv and use it as the filename with the dynamic expression @dataset().fileName, as shown below:

    enter image description here

    In the Get Metadata activity, create child items as field list as shown below:

    enter image description here

    You will get the Metadata activity output as below:

    enter image description here

    Add a Set Variable activity, create a string-type variable, and add the @activity('Get Metadata1').output.childItems[0].name dynamic expression to get the file name.

    enter image description here

    You will get the output as shown below:

    enter image description here

    If you have multiple files in the directory, use the ForEach activity with sequential and the dynamic expression @activity('Get Metadata1').output.childItems for items to store the filename into a variable. Inside the ForEach, add a Set Variable activity, create a string-type variable, and add the @item().Name dynamic expression to get the file name.

    enter image description here

    Debug the pipeline, and every file name will be stored in a variable, as shown below:

    enter image description here