Search code examples
sqlnullduplicatesuniondbnull

In PostgreSQL (and maybe other engines), why does the UNION statement consider NULL values the same, while the UNIQUE constraint does not?


I understand that the SQL standard allows multiple NULL values in a column that is part of the UNIQUE constraint.

What I don't understand is why the UNION construct (at least in PostgreSQL,) treats NULL values as the same. For example:

$ select * from tmp_a;
 a | b
---+---
 a | b
 a |
   |
(3 rows)

$ select * from tmp_b;
 a | b
---+---
 a | c
 a |
   |
(3 rows)

$ select a, b from tmp_a union select a, b from tmp_b order by 1, 2;
 a | b
---+---
 a | b
 a | c
 a |
   |
(4 rows)

Solution

  • The General Rule in the SQL-92 Standard is as follows:

    13.1 'declare cursor' (remember ORDER BY is part of a cursor) General Rule 3b:

    the following special treatment of null values. Whether a sort key value that is null is considered greater or less than a non-null value is implementation-defined, but all sort key values that are null shall either be considered greater than all non-null values or be considered less than all non-null values.

    The SQL-89 stated the same a little more clearly IMO:

    Although x = y is unknown if both x and y are NULL values, in the context of GROUP BY, ORDER BY and DISTINCT, a NULL value is identical to or is a duplicate of another NULL value.

    I would guess that PostgreSQL is performing a sort to remove duplicates as required by UNION and is grouping NULL values together in line with Standards.