I need to process data in Azure Data Lake. My flow is as follows:
The problem is following statemanet:
U-SQL’s procedures do not provide any imperative code-flow constructs such as a for or while loops.
Any idea how to process data in similar way as with cursor?
I didn't find any documentation regarding to the cursors in U-SQL.
Thank you!
There are no cursors in U-SQL, because of the statement you reference above.
U-SQL does not provide any imperative code-flow constructs because it impedes the optimizer's ability to globally optimize your script.
You should think of approaching your problem declaratively. For example, if you have a list of IDs (either in a table or SqlArray or even a file), use a declarative join. For example, you want to add 42 to every value where the key is in a list of existing keys:
// Two options for providing the "looping data"
// Option 1: Array Variable
DECLARE @keys_var = new SqlArray<string>{"k1", "k2", "k3"};
// Option 2: Rowset (eg from an EXTRACT from file, a table or other place)
@keys = SELECT * FROM (VALUES("k1"), ("k2"), ("k3")) AS T(key);
// This is the data you want to iterate over to add 42 to the value column for every matching key
@inputdata = SELECT * FROM (VALUES (1, "k1"), (2, "k1"), (3, "k2"), (6, "k5")) AS T(value, key);
//Option 1:
@res = SELECT value+42 AS newval, key FROM @inputdata WHERE @keys_var.Contains(key);
OUTPUT @res TO "/output/opt1.csv" USING Outputters.Csv();
//Option 2:
@res = SELECT value+42 AS newval, i.key
FROM @inputdata AS i INNER JOIN @keys AS k
ON i.key == k.key;
OUTPUT @res TO "/output/opt2.csv" USING Outputters.Csv();
Now in your case, you want to have data-driven output file sets. This feature is currently being worked on (it is one of our top asks). Until then you would have to write a script to generate the script (I will provide an example on your other question).