Search code examples
loopsazurecursorazure-data-lakeu-sql

U-SQL cursor in Azure Data Lake


I need to process data in Azure Data Lake. My flow is as follows:

  1. I would like to select from the database list of IDs for next processing. This I have done.
  2. I need to iterate through IDs (from the first step) and I need to successively export data into separated files (partitioned by ID)

The problem is following statemanet:

U-SQL’s procedures do not provide any imperative code-flow constructs such as a for or while loops.

Any idea how to process data in similar way as with cursor?

I didn't find any documentation regarding to the cursors in U-SQL.

Thank you!


Solution

  • There are no cursors in U-SQL, because of the statement you reference above.

    U-SQL does not provide any imperative code-flow constructs because it impedes the optimizer's ability to globally optimize your script.

    You should think of approaching your problem declaratively. For example, if you have a list of IDs (either in a table or SqlArray or even a file), use a declarative join. For example, you want to add 42 to every value where the key is in a list of existing keys:

    // Two options for providing the "looping data"
    // Option 1: Array Variable
    DECLARE @keys_var = new SqlArray<string>{"k1", "k2", "k3"};
    
    // Option 2: Rowset (eg from an EXTRACT from file, a table or other place)
    @keys = SELECT * FROM (VALUES("k1"), ("k2"), ("k3")) AS T(key);
    
    // This is the data you want to iterate over to add 42 to the value column for every matching key
    @inputdata = SELECT * FROM (VALUES (1, "k1"), (2, "k1"), (3, "k2"), (6, "k5")) AS T(value, key);
    
    //Option 1:
    @res = SELECT value+42 AS newval, key FROM @inputdata WHERE @keys_var.Contains(key);
    
    OUTPUT @res TO "/output/opt1.csv" USING Outputters.Csv();
    
    //Option 2:
    @res = SELECT value+42 AS newval, i.key 
        FROM @inputdata AS i INNER JOIN @keys AS k 
             ON i.key == k.key;
    
    OUTPUT @res TO "/output/opt2.csv" USING Outputters.Csv();
    

    Now in your case, you want to have data-driven output file sets. This feature is currently being worked on (it is one of our top asks). Until then you would have to write a script to generate the script (I will provide an example on your other question).