Let's say we have two tables:
CREATE TABLE element (
pk1 BIGINT NOT NULL,
pk2 BIGINT NOT NULL,
pk3 BIGINT NOT NULL,
-- other columns ...
PRIMARY KEY (pk1, pk2, pk3)
);
CREATE TYPE element_pk_t AS (
pk1 BIGINT,
pk2 BIGINT,
pk3 BIGINT
);
CREATE TABLE collection (
id BIGINT,
elements element_pk_t[] NOT NULL,
);
The element
has a composite PK. The custom type element_pk_t
registers a matching composite type. The collection
table contains array of element_pk_t
.
I want to query all rows from table element
where the PK matches an element in a chosen collection.elements
, in a single query.
What I've tried:
SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT unnest(elements)
FROM collection
WHERE id = 1);
I get an error in the IN
clause:
ERROR: subquery has too few columns
However, this works:
SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN ((1, 2, 3), (4, 5, 6));
So it seems that the problem is how to expand the customized type element_pk_t
to 3 columns that can match (pk1, pk2, pk3)
.
This works:
SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT (unnest(elements)).*
FROM collection
WHERE id = 1);
Or more verbose, but preferable:
SELECT *
FROM element
WHERE (pk1, pk2, pk3) IN (SELECT (e).*
FROM collection c, unnest(c.elements) e
WHERE c.id = 1);
More robust and avoids evaluating unnest()
multiple times. See:
This works, too:
SELECT *
FROM element
WHERE ROW((pk1, pk2, pk3)) IN (SELECT unnest(elements)
FROM collection
WHERE id = 1);
The core of the problem is that IN
taking a subquery knows two separate forms. Quoting the manual:
expression IN (subquery)
row_constructor IN (subquery)
Your failing query resolves to the second form, while you (understandably) expect the first. But the second form does this:
The left-hand side of this form of
IN
is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. [...]
My first and second query make it work by decomposing the row type to the right of the operator. So Postgres has three bigint
values left and right and is satisfied.
My third query makes it work by nesting the row type to the left in another row constructor. Postgres only decomposes the first level and ends up with a single composite type - matching the single composite type to the right.
Note that the keyword ROW
is required for the single field we are wrapping. The manual:
The key word
ROW
is optional when there is more than one expression in the list.
Your working query is subtly different as it provides a list of values to the right instead of a subquery (set). That's a different implementation taking a different code path. It even gets a separate chapter in the manual. This variant has no special treatment for a ROW constructor to the left. So it just works as expected (by you).
More equivalent (working) syntax variants with = ANY
:
SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY ('{"(1,2,3)","(2,3,4)"}'::element_pk_t[]);
SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3)::element_pk_t,(2,3,4)::element_pk_t]);
SELECT * FROM element
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3),(2,3,4)]::element[]);
Also valid with (pk1, pk2, pk3)::element_pk_t
or ROW(pk1, pk2, pk3)::element_pk_t
See:
Since your source is an array, Daniel's second query with (e.pk1, e.pk2, e.pk3) = ANY(c.elements)
lends itself naturally.
But for a bet on the fastest query, my money is on my second variant, because I expect it to use the PK index optimally.
Just as proof of concept. Like a_horse commented: a normalized DB design will probably scale best.