Search code examples
sqlpostgresqlsql-inrowtype

How to match elements in an array of composite type?


Let's say we have two tables:

CREATE TABLE element (
    pk1 BIGINT NOT NULL,
    pk2 BIGINT NOT NULL,
    pk3 BIGINT NOT NULL,
    -- other columns ...
    PRIMARY KEY (pk1, pk2, pk3)
);

CREATE TYPE element_pk_t AS (
    pk1 BIGINT,
    pk2 BIGINT,
    pk3 BIGINT
);

CREATE TABLE collection (
    id BIGINT,
    elements element_pk_t[] NOT NULL,
);

The element has a composite PK. The custom type element_pk_t registers a matching composite type. The collection table contains array of element_pk_t.

I want to query all rows from table element where the PK matches an element in a chosen collection.elements, in a single query.

What I've tried:

SELECT * 
FROM element 
WHERE (pk1, pk2, pk3) IN (SELECT unnest(elements) 
                          FROM collection 
                          WHERE id = 1);

I get an error in the IN clause:

ERROR: subquery has too few columns

However, this works:

SELECT * 
FROM element 
WHERE (pk1, pk2, pk3) IN ((1, 2, 3), (4, 5, 6));

So it seems that the problem is how to expand the customized type element_pk_t to 3 columns that can match (pk1, pk2, pk3).


Solution

  • This works:

    SELECT *
    FROM   element 
    WHERE  (pk1, pk2, pk3) IN (SELECT (unnest(elements)).*
                               FROM   collection
                               WHERE  id = 1);
    

    Or more verbose, but preferable:

    SELECT *
    FROM   element 
    WHERE  (pk1, pk2, pk3) IN (SELECT (e).*
                               FROM   collection c, unnest(c.elements) e
                               WHERE  c.id = 1);
    

    More robust and avoids evaluating unnest() multiple times. See:

    This works, too:

    SELECT *
    FROM   element 
    WHERE  ROW((pk1, pk2, pk3)) IN (SELECT unnest(elements)
                                    FROM   collection
                                    WHERE  id = 1);
    

    The core of the problem is that IN taking a subquery knows two separate forms. Quoting the manual:

    expression IN (subquery)

    row_constructor IN (subquery)

    Your failing query resolves to the second form, while you (understandably) expect the first. But the second form does this:

    The left-hand side of this form of IN is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. [...]

    My first and second query make it work by decomposing the row type to the right of the operator. So Postgres has three bigint values left and right and is satisfied.

    My third query makes it work by nesting the row type to the left in another row constructor. Postgres only decomposes the first level and ends up with a single composite type - matching the single composite type to the right.

    Note that the keyword ROW is required for the single field we are wrapping. The manual:

    The key word ROW is optional when there is more than one expression in the list.


    Your working query is subtly different as it provides a list of values to the right instead of a subquery (set). That's a different implementation taking a different code path. It even gets a separate chapter in the manual. This variant has no special treatment for a ROW constructor to the left. So it just works as expected (by you).

    More equivalent (working) syntax variants with = ANY:

    SELECT * FROM element 
    WHERE (pk1, pk2, pk3) = ANY ('{"(1,2,3)","(2,3,4)"}'::element_pk_t[]);
    
    SELECT * FROM element 
    WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3)::element_pk_t,(2,3,4)::element_pk_t]);
    
    SELECT * FROM element 
    WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3),(2,3,4)]::element[]);
    

    Also valid with (pk1, pk2, pk3)::element_pk_t or ROW(pk1, pk2, pk3)::element_pk_t

    See:

    Since your source is an array, Daniel's second query with (e.pk1, e.pk2, e.pk3) = ANY(c.elements) lends itself naturally.
    But for a bet on the fastest query, my money is on my second variant, because I expect it to use the PK index optimally.

    Just as proof of concept. Like a_horse commented: a normalized DB design will probably scale best.