Search code examples
sqlsql-serveruuid

Cross joining a single row subquery with a column using `newid()` results in every row having a different GUID


Abstract

A query like

SELECT *
       FROM elbat t
            CROSS JOIN (SELECT newid() guid) x;

in SQL Server produces a result where every row has a different GUID instead of every row having a common GUID throughout the result. How to have one GUID for all rows of the result (without using a variable or (temporary) table)?

Setup

Consider the following table in an SQL Server database.

CREATE TABLE elbat
             (id integer);

INSERT INTO elbat
            VALUES (1);
INSERT INTO elbat
            VALUES (2);
INSERT INTO elbat
            VALUES (3);
INSERT INTO elbat
            VALUES (4);
INSERT INTO elbat
            VALUES (5);
INSERT INTO elbat
            VALUES (6);

Let's run the following query.

SELECT *
       FROM elbat t
            CROSS JOIN (SELECT newid() guid) x;

Here's a db<>fiddle and an SQL Fiddle to see it in action.

Problem

Much to my surprise, in the result every row has a different GUID. E.g.:

 id | guid                                
 -: | :-----------------------------------
  1 | ad146af7-9ebd-4521-a440-47c7dea6a1d4
  2 | ce24fbb8-af64-480c-8c46-1e03187642c5
  3 | 14509451-9b1d-49e9-8da2-c691947ae805
  4 | 37a86339-e352-486f-b541-92798540599f
  5 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  6 | d491275b-4ebb-461b-94e2-93b47e7d2348

That left me puzzled. I'd expected every row to have the same GUID throughout the whole result set. E.g.:

 id | guid                                
 -: | :-----------------------------------
  1 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  2 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  3 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  4 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  5 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
  6 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8

Of course I understand, that the GUID will change from call to call. But I don't understand why it changes from row to row as I cross joined a single GUID and didn't put the newid() call in the list of projected columns.

Additional information

I tried this with all available versions on the fiddle platforms and additionally with a local Microsoft SQL Server 2014 (12.0.2269.0 (X64), Express). The result is the same everywhere (only the GUIDs change of course).

Questioning my understanding of joins, I also did some tests on other DBMS with an equivalent setup and query.

  • (DDL and DML stay the same for all of them.)
  • Postgres:

    SELECT *
           FROM elbat t
                CROSS JOIN (SELECT uuid_generate_v4() guid) x;
    

    SQL Fiddle

  • Oracle:

    SELECT *
           FROM elbat t
                CROSS JOIN (SELECT sys_guid() guid
                                   FROM dual) x;
    

    db<>fiddle

  • MariaDB:

    SELECT *
           FROM elbat t
                CROSS JOIN (SELECT uuid() guid) x;
    

    db<>fiddle

  • MySQL:

    SELECT *
           FROM elbat t
                CROSS JOIN (SELECT uuid() guid) x;
    

    SQL Fiddle

All of these other DBMS produce a result as I actually expect -- one common GUID in all rows of the result.

I also played around changing the query. To no avail though.

  • Put the subquery in a CTE.
  • I tried selecting from a physical table in the subquery, rather than the SELECT without a FROM, using TOP and also a primary key.
  • Using an implicit cross join (FROM elbat, (SELECT newid() ...)).
  • Using CROSS APPLY.

Looking up the documentation, I couldn't find this behavior covered anywhere.

Question

Why does SQL Server behave different than all the other (tested) DBMS (in that regard) and is there a way to get a result as expected (without using a variable or (temporary) table)?

(Note: I know I could use a variable initialized with newid() and have it in the projected columns. But The problem actually arose while I was trying to avoid such a variable. I actually wanted to look for a variable-less, query-only solution for "Order table randomly but with exceptions".)


Solution

  • I am really surprised at the SQL Server behavior. I did not realize that it re-evaluates such subqueries over-and-over. I suspect the cause is an optimization: the expression in the cross join is actually moved to the the node that reads the data, so the function is called over and over.

    In any case, I consider this wrong. Such an optimization should recognize that newid() is a volatile function and adjust accordingly.

    After some experimentation, I figured out that order by in the subquery does cause it to be evaluated only once. So, this does what you want:

    select *
    from elbat cross join
         (select top (1) newid() as guid
          order by guid
         ) x;
    

    Another version that does what you expect:

    select *
    from elbat cross join
         (select max(newid()) as guid
         ) x;
    

    Incidentally, this latter version works in the select as well:

    select *, (select max(newid())) as guid
    from elbat ;
    

    And in this case, I would expect the subquery to be evaluated once for every row. Go figure.