Search code examples
azureu-sqlazure-data-factory

Azure data factory (ADFv2) - how to process multiple input files from different folder in a USQL job


We have a requirement to process multiple files from different folders using Azure data factory and USQL.

Here is the folder structure we have

Year --> Month --> Day

We have a folder for every date, say 1,2,3...31. The requirement is to read files from specific folders and pass it to USQL to do analytics processing. We need to process data for multiple dates. Is there any way in data factory we can read data from multiple folders.

Example: I need to read data for the dates, 1,7 and 10 for a specific month. I do not want to read all the files for the month

Please let us know if you have come across a solution for the above scenario.


Solution

  • yes there is. You should use Virtual columns. Example: Your file has only column1 and column2.

    path1=/yourFolder/2018/11/1/file.csv

    path2=/yourFolder/2018/10/25/file.csv

    DECLARE date1 = new DateTime(2018,11,1);
    DECLARE date2 = new DateTime(2018,10,25);
    @inputData = EXTRACT column1 string,
    column2 string, 
    FileDate DateTime //this is virtual column
    FROM "/yourFolder/{FileDate:yyyy}/{FileDate:MM}/{FileDate:dd}/file.csv"
    USING Extractors.Text(delimiter:';',skipFirstNRows:1);
    @res = SELECT * FROM @inputData WHERE FileDate == date1 AND FileDate ==date2;
    

    This way you can manage your folder structure using virtual columns. You can ofcourse use BETWEEN if you want to take files from some date range etc. HTH

    P.S. You can send parameters from ADF to USQL stored procedure i just gave an example how to work with specific files.