Search code examples
azureazure-data-factorywildcard-mapping

ADF Copy Activity problem with wildcard path


I have a seemingly simple task to integrate multiple json files that are residing in a data lake gen2 The problem is files that need to be integrated are located in multiple folders, for example this is a typical structure that I am dealing with: Folder1\Folder2\Folder3\Folder4\Folder5\2022\Month\Day\Hour\Minute\ <---1 file in Minute Folder Than same structure for 20223 year, so in order for me to collect all the files I have to go to bottom of the structure which is Minute folder, if I use wildcard path it looks like this: Wildcard paths 'source from dataset"/ *.json, it copies everything including all folders, and I just want files, I tried to narrow it down and copies only first for 2022 but whatever I do is not working in terms of wildcard paths, help is much appreciated

trying different wildcard combinations did not help, obviously I am doing something wrong


Solution

  • There is no option to copy files from multiple sub- folders to single destination folder. Flatten hierarchy as a copy behavior also will have autogenerated file names in target.

    enter image description here image reference MS document on copy behaviour

    Instead, you can follow the below approach.

    • In order to list the file path in the container, take the Lookup activity and connect to xml dataset with HTTP linked service.

    Give the Base URL in HTTP connector as, https://<storage_account_name>.blob.core.windows.net/<container>?restype=directory&comp=list.

    [Replace <storage account name> and <container> with the appropriate name in the above URL]

    enter image description here

    • Lookup activity gives the list of folders and files as separate line items as in following image.

    enter image description here

    • Take the Filter activity and filter the URLs that end with .json from the lookup activity output.

    Settings of filter activity:

    items: @activity('Lookup1').output.value[0].EnumerationResults.Blobs.Blob

    condition: @endswith(item().URL,'.json')

    Output of filter activity enter image description here

    • Take the for-each activity next to filter activity and give the item of for-each as @activity('Filter1').output.value

    • Inside for-each activity, take the copy activity.

    • Take http connector and json dataset as source, give the base url as https://<account-name>.blob.core.windows.net/<container-name>/

    • Create the parameter for relative URL and value for that parameter as @item().name

    enter image description here

    • In sink, give the container name and folder name.
    • Give the file name as dynamic content. @split(item().name,'/')[sub(length(split(item().name,'/')),1)]

    This expression will take the filename from relative URL value. enter image description here

    • When the pipeline is run, all files from multiple folders got copied to single folder. enter image description here