Abstract
A query like
SELECT *
FROM elbat t
CROSS JOIN (SELECT newid() guid) x;
in SQL Server produces a result where every row has a different GUID instead of every row having a common GUID throughout the result. How to have one GUID for all rows of the result (without using a variable or (temporary) table)?
Setup
Consider the following table in an SQL Server database.
CREATE TABLE elbat
(id integer);
INSERT INTO elbat
VALUES (1);
INSERT INTO elbat
VALUES (2);
INSERT INTO elbat
VALUES (3);
INSERT INTO elbat
VALUES (4);
INSERT INTO elbat
VALUES (5);
INSERT INTO elbat
VALUES (6);
Let's run the following query.
SELECT *
FROM elbat t
CROSS JOIN (SELECT newid() guid) x;
Here's a db<>fiddle and an SQL Fiddle to see it in action.
Problem
Much to my surprise, in the result every row has a different GUID. E.g.:
id | guid
-: | :-----------------------------------
1 | ad146af7-9ebd-4521-a440-47c7dea6a1d4
2 | ce24fbb8-af64-480c-8c46-1e03187642c5
3 | 14509451-9b1d-49e9-8da2-c691947ae805
4 | 37a86339-e352-486f-b541-92798540599f
5 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
6 | d491275b-4ebb-461b-94e2-93b47e7d2348
That left me puzzled. I'd expected every row to have the same GUID throughout the whole result set. E.g.:
id | guid
-: | :-----------------------------------
1 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
2 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
3 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
4 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
5 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
6 | cbee1a8e-02ce-4915-8d2c-ef5db299d8c8
Of course I understand, that the GUID will change from call to call. But I don't understand why it changes from row to row as I cross joined a single GUID and didn't put the newid()
call in the list of projected columns.
Additional information
I tried this with all available versions on the fiddle platforms and additionally with a local Microsoft SQL Server 2014 (12.0.2269.0 (X64), Express). The result is the same everywhere (only the GUIDs change of course).
Questioning my understanding of joins, I also did some tests on other DBMS with an equivalent setup and query.
Postgres:
SELECT *
FROM elbat t
CROSS JOIN (SELECT uuid_generate_v4() guid) x;
Oracle:
SELECT *
FROM elbat t
CROSS JOIN (SELECT sys_guid() guid
FROM dual) x;
MariaDB:
SELECT *
FROM elbat t
CROSS JOIN (SELECT uuid() guid) x;
MySQL:
SELECT *
FROM elbat t
CROSS JOIN (SELECT uuid() guid) x;
All of these other DBMS produce a result as I actually expect -- one common GUID in all rows of the result.
I also played around changing the query. To no avail though.
SELECT
without a FROM
, using TOP
and also a primary key.FROM elbat, (SELECT newid() ...)
).CROSS APPLY
.Looking up the documentation, I couldn't find this behavior covered anywhere.
Question
Why does SQL Server behave different than all the other (tested) DBMS (in that regard) and is there a way to get a result as expected (without using a variable or (temporary) table)?
(Note: I know I could use a variable initialized with newid()
and have it in the projected columns. But The problem actually arose while I was trying to avoid such a variable. I actually wanted to look for a variable-less, query-only solution for "Order table randomly but with exceptions".)
I am really surprised at the SQL Server behavior. I did not realize that it re-evaluates such subqueries over-and-over. I suspect the cause is an optimization: the expression in the cross join
is actually moved to the the node that reads the data, so the function is called over and over.
In any case, I consider this wrong. Such an optimization should recognize that newid()
is a volatile function and adjust accordingly.
After some experimentation, I figured out that order by
in the subquery does cause it to be evaluated only once. So, this does what you want:
select *
from elbat cross join
(select top (1) newid() as guid
order by guid
) x;
Another version that does what you expect:
select *
from elbat cross join
(select max(newid()) as guid
) x;
Incidentally, this latter version works in the select
as well:
select *, (select max(newid())) as guid
from elbat ;
And in this case, I would expect the subquery to be evaluated once for every row. Go figure.