In PostgreSQL, I have a table that, abstractly, looks like this:
╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝
I want to transform it in a query into this:
╔═══╦═══╦══════╗
║ A ║ B ║ D ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1 ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0 ║
╚═══╩═══╩══════╝
…such that:
For each A and B pair:
If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.
For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.
Otherwise, if no such row exists, then the output table has a row (A, B, null).
For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.
I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.
To get a single row from each group with the same (A, B)
there is a simple, short and fast way: DISTINCT ON
- not involving aggregate functions, window functions, or subqueries at all:
SELECT DISTINCT ON (A, B)
A, B, CASE WHEN A = C THEN D END AS D
FROM tbl
ORDER BY A, B, (A = C) DESC;
Produces your desired result exactly.
db<>fiddle here
Assuming all involved columns are defined NOT NULL
, or you need to do more.
The final ORDER BY
item (A = C) DESC
sorts the row with A = C
first per group. It's a boolean
expression, and FALSE
sorts before TRUE
. If there can be multiple rows, add more ORDER BY
items to break ties.
CASE WHEN A = C THEN D END
implements your requirement that D
is only output for the given condition. Else we get NULL
(the default for CASE
), as desired.
Detailed explanation:
More performance optimization may be possible for big tables: