Search code examples
azure-data-factoryazure-data-lake

How to compare two files with different filename in ADF


We have multiple files in a azure data lake gen2 folder, file name like abc_test_20170505_120101.csv ,hash_abc_test_20170505_120101.csv ,abc_sample_20170505_110101.csv, hash_abc_sample_20170505_110101.csv.

I want to copy abc_test_20170505_120101.csv with hash_abc_test_20170505_120101.csv

And File abc_sample_20170505_110101.csv with hash_abc_sample_20170505_110101.csv

File abc_test file should be copy with hash_abc one and another file should be copy with hash one.

How we can do this using azure data factory.


Solution

    • Since there is a metadata file for every data file, you can use the following activities to get the an array of object with each object has keys called data and metadata with values as the respective filename.

    • First I have used get metadata to get the list of files:

    enter image description here

    • Iterate through these child items. Inside for loop use an if activity to check whether the item's name starts with hash or not. We need names whose name does not start with hash. The following is the dynamic content I used as condition for if activity.
    @not(startswith(item().name,'hash'))
    

    enter image description here

    • Inside the True path, use a set variable activity to build the object. The following is the dynamic content that I used:
    {
        "data": "@{item().name}",
        "metadata":"@{concat('hash_',item().name)}"
    }
    
    • Now convert the above string variable to object using json() function and append it to an array using append variable activity.

    enter image description here

    • After executing the pipeline, you will get the results as shown in the below image:

    enter image description here

    • If there is no metadata file for any a particular data file, you can use additional conditions to check it after this process.