Search code examples
temp-tablesazure-synapse

Azure SQL Data Warehouse temporary table creation with CTAS results in unexpected order of rows


I already wrote a code example with comments, so I will let it do the explaining:

/*
    Querying sys.system_views to produce a set of rows to use as an example.
    No matter how many times the SELECT statement is ran, the results are always the same.
*/
SELECT
    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS sequence,
    s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'

/*
    Creating a temporary table using the CTAS principle, as is documented at:
    https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-loops
    (Since this behaviour was noticed when trying to execute procedures in a defined order.)
*/
CREATE TABLE #list WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT
    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS sequence,
    s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'

/*
    The results in the temporary table #list are not the
    same as the results of the SELECT statement when ran independently.

    No matter how many times the temporary table is created,
    the results are in the same order, which again,
    is not the resulting order when running the SELECT statement.
*/

SELECT *
FROM #list;

DROP TABLE #list;

Below is an example picture of the query results side-by-side.

Query results of SELECT and SELECT from #list

The issue is easily prevented by using a proper ORDER BY in the ROW_NUMBER function:

CREATE TABLE #list WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT
    ROW_NUMBER() OVER(ORDER BY s.name + '.' + sv.name) AS sequence,
    s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'

So my question is, why is the order different in the #list temporary table compared to the SELECT statement?


Solution

  • If you use non deterministic SQL constructs you can't expect deterministic results. ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) doesn't guarantee any particular ordering as all rows are tied for the ordering value. Any ordering is correct and you will just get whatever is most convenient for the execution plan.

    – Martin Smith

    Answer and discussion in the comments of the question.