Search code examples
u-sql

How to merge files for N number of days using U-SQL


I want to merge the data from 7 days of files and perform some operation on them using U-SQL. The folder structure on ADLS - /sample/data/YYYY/MM/DD.1.csv For e.g. Today is 03/01/2018 (DD/MM/YYYY) then I want to pick the data from 27/12/2017 to 02/01/2018.

In U-SQL how to achieve this?


Solution

  • you should use virtual columns in your file path.
    For example:

     DECLARE @path = "/sample/data/{FileDate:yyyy}/{FileDate:MM}/{FileDate:dd}.1.csv
     DECLARE @startDate = new DateTime(2017,12,27);
     DECLARE @endDate = new DateTime(2018,01,02);
    
     @data = EXTRACT 
         column1 string,
        column2 string,
        FileDate DateTime
        from @path
    USING Extractors.Csv(); //or which extractor you are using
    
    
    OUTPUT(
    SELECT * FROM @data
    WHERE FileDate BETWEEN @startDate AND @endDate)
    TO "/sample/data/appended.csv
    USING Outputters.Text(delimiter : ';', outputHeader : true); //output this in your csv file
    

    This code should extract you all data in your date range. P.S. I am not sure just if i wrote good file name, check that. Hope this will help you. Here you can find documentation about virtual columns.