Search code examples
azureazure-data-factoryazure-databricks

Azure Data Factory V2 - Cannot combine wildcard filenames with dynamic content filepath. Is there a databricks (ADB) solution or another ADF solution?


I currently have an upstream process that dumps a near-random amount of .zip files to an Azure Data Lake Storage, with each folder being named something like FILES/PROCESSING/2019/04/19.

I created an Azure Data Factory V2 (ADF) Copy Data process to dynamically grab any files in "todays" filepath, but there's a support issue with combining dynamic content filepaths and wildcard file names, like seen below.

Is there any workaround for this in ADF?

Thanks!

Here's my Linked Service's dynamic filepath with wildcard file names:

FILES/PROCESSING/@formatDateTime(utcnow(),'yyyy')/@formatDateTime(utcnow(),'mm')/@formatDateTime(utcnow(),'dd')

and the wildcard is:

/*.zip

I expect the process to run, but instead get this error message:

Activity CopyNewFiles failed: Failure happened on 'Source' side. ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot find the 'Azure Data Lake Store' file. . Service request id: c0266e28-d841-40b7-b177-e67d5e5388a1 Response details: {"RemoteException":{"exception":"FileNotFoundException","message":"File/Folder does not exist: /FILES/PROCESSING/2019/04/30 [c0266e28-d841-40b7-b177-e67d5e5388a1][2019-04-30T12:08:55.0353825-07:00]","javaClassName":"java.io.FileNotFoundException"}},Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (404) Not Found.,Source=System,'

Only the file path DOES exist. If I run the manual process pointing directly at the file without the dynamic content, it runs just fine.

I've looked through ADF documentation trying to see if this is a known bug, and I'm not seeing anything that fits the bill.


Solution

  • This should work in your path:

    @Concat('FILES/PROCESSING/',utcnow('yyyy/MM/dd'))
    

    and *.zip in file bit.

    Only one @ at the start.

    You can embed functions in the formula like how you have, but you need to put curly braces around each pipeline bit like below and then that directly substitutes the values into the code without the concat:

    FILES/PROCESSING/@{formatDateTime(utcnow(),'yyyy')}/@{formatDateTime(utcnow(),'MM')}/@{formatDateTime(utcnow(),'dd')}/*.zip
    

    also note capital MM for month, mm is minutes :)