I understand that the SQL standard allows multiple NULL values in a column that is part of the UNIQUE constraint.
What I don't understand is why the UNION construct (at least in PostgreSQL,) treats NULL values as the same. For example:
$ select * from tmp_a;
a | b
---+---
a | b
a |
|
(3 rows)
$ select * from tmp_b;
a | b
---+---
a | c
a |
|
(3 rows)
$ select a, b from tmp_a union select a, b from tmp_b order by 1, 2;
a | b
---+---
a | b
a | c
a |
|
(4 rows)
The General Rule in the SQL-92 Standard is as follows:
13.1 'declare cursor' (remember ORDER BY
is part of a cursor)
General Rule 3b:
the following special treatment of null values. Whether a sort key value that is null is considered greater or less than a non-null value is implementation-defined, but all sort key values that are null shall either be considered greater than all non-null values or be considered less than all non-null values.
The SQL-89 stated the same a little more clearly IMO:
Although
x = y
is unknown if bothx
andy
areNULL
values, in the context ofGROUP BY
,ORDER BY
andDISTINCT
, aNULL
value is identical to or is a duplicate of anotherNULL
value.
I would guess that PostgreSQL is performing a sort to remove duplicates as required by UNION
and is grouping NULL
values together in line with Standards.