I'm attempting to perform some sort of upsert operation in U-SQL where I pull data every day from a file, and compare it with yesterdays data which is stored in a table in Data Lake Storage.
I have created an ID column in the table in DL using row_number(), and it is this "counter" I wish to continue when appending new rows to the old dataset. E.g. Last inserted row in DL table could look like this:
ID | Column1 | Column2
---+------------+---------
10 | SomeValue | 1
I want the next rows to have the following ascending ids
11 | SomeValue | 1
12 | SomeValue | 1
How would I go about making sure that the next X rows continues the ID count incrementally such that the next rows each increases the ID column by 1 more than the last?
You could use ROW_NUMBER
then add it to the the max value from the original table (ie using CROSS JOIN
and MAX
). A simple demo of the technique:
DECLARE @outputFile string = @"\output\output.csv";
@originalInput =
SELECT *
FROM ( VALUES
( 10, "SomeValue 1", 1 )
) AS x ( id, column1, column2 );
@newInput =
SELECT *
FROM ( VALUES
( "SomeValue 2", 2 ),
( "SomeValue 3", 3 )
) AS x ( column1, column2 );
@output =
SELECT id, column1, column2
FROM @originalInput
UNION ALL
SELECT (int)(x.id + ROW_NUMBER() OVER()) AS id, column1, column2
FROM @newInput
CROSS JOIN ( SELECT MAX(id) AS id FROM @originalInput ) AS x;
OUTPUT @output
TO @outputFile
USING Outputters.Csv(outputHeader:true);
My results:
You will have to be careful if the original table is empty and add some additional conditions / null checks but I'll leave that up to you.