Search code examples
sqlpostgresqlcomposite-types

How to access fields of anonymous record?


Given the following query:

WITH t as (
    SELECT name, array_agg(DISTINCT("age", "gender")) as "ages_and_genders"
    FROM (
        SELECT * FROM (VALUES ('bob', 33, 'm'), ('bob', 33, 'f'), ('alice', 30, 'f')) AS t ("name","age", "gender")
    ) as t
    GROUP BY name
)

SELECT name, "ages_and_genders"[1]
FROM t
WHERE array_length("ages_and_genders", 1) = 1

How do I go about breaking apart the record/tuple returned into the "age" and "gender" as separate columns?

I expect to get back a result:

name    | age | gender
-------------------
"alice" | 30  | 'f'

Solution

  • Postgres cannot decompose anonymous record types. To access individual fields, the nested structure must be known. Cast to a well-known row type. If no matching type exists, yet, register one first. There are various ways. For ad-hoc use, a "temporary" type (undocumented) is advisable. (For repeated use, register a plain type.)

    The manual:

    By default, the value created by a ROW expression is of an anonymous record type. If necessary, it can be cast to a named composite type — either the row type of a table, or a composite type created with CREATE TYPE AS. An explicit cast might be needed to avoid ambiguity.

    Run once in your session:

    CREATE TYPE pg_temp.my_int_txt AS (age int, gender text);
    

    Then, with a drop-in fix (preserving other awkward syntax):

    WITH t as (
        SELECT name, array_agg(DISTINCT("age", "gender")::pg_temp.my_int_txt) as "ages_and_genders"
        FROM (
            SELECT * FROM (VALUES ('bob', 33, 'm'), ('bob', 33, 'f'), ('alice', 30, 'f')) AS t ("name","age", "gender")
        ) as t
        GROUP BY name
    )
    SELECT name, ("ages_and_genders"[1]).age, ("ages_and_genders"[1]).gender
    FROM t
    WHERE array_length("ages_and_genders", 1) = 1;
    

    The same, consolidated:

    WITH cte AS (
       SELECT name, array_agg(DISTINCT ROW(age, gender)::pg_temp.my_int_txt) AS ag
       FROM  (
         VALUES
           ('bob'  , 33, 'm')
         , ('bob'  , 33, 'f')
         , ('alice', 30, 'f')
       ) AS t (name, age, gender)
       GROUP BY name
       )
    SELECT name, (ag[1]).*
    FROM   cte
    WHERE  cardinality(ag) = 1;
    

    Of course, I still wouldn't use that.
    Typically, there are much simpler and faster solutions. Like:

    SELECT name, min(age) AS age, min(gender) AS gender
    FROM  (
       VALUES
         ('bob'  , 33, 'm')
       , ('bob'  , 33, 'f')
       , ('alice', 30, 'f')
       ) AS t (name, age, gender)
    GROUP  BY name
    HAVING count(DISTINCT (age, gender)) = 1;
    

    Or:

    WITH cte(name, age, gender) AS (
       VALUES
         ('bob'  , 33, "char" 'm')
       , ('bob'  , 33, 'f')
       , ('alice', 30, 'f')
       )
    SELECT SELECT DISTINCT ON (name) *
    FROM   cte t
    WHERE  NOT EXISTS (
       SELECT FROM cte t1
       WHERE t1.name = t.name
       AND  (t1.age, t1.gender) IS DISTINCT FROM (t.age, t.gender)
       );
    

    fiddle

    If age and gender are NOT NULL, you can simplify.

    Notes

    You may not be aware of all the things going on in your code example.

    You don't need SELECT * FROM (VALUES ... in this context. The VALUES expression can stand on its own.

    The simple string and numeric constants default to the basic types text and integer. For other data you may want explicit type declarations.
    For example, it's typically better to work with birthdays instead of age. The type date would need an explicit cast or declaration in the first input row (or any, really) like:

    WITH t(name, birthday, gender) AS (
       VALUES
         ('bob', date '1990-06-01', 'm')  -- !
       , ('bob', '1990-06-01', 'f')
       ...
    

    DISTINCT is not a function. In your query, it's a syntax element of the aggregate function.
    And ("age", "gender") is a ROW constructor (with redundant double-quotes).
    Clear syntax for both: array_agg(DISTINCT ROW(age, gender)).

    If all values are NOT NULL, simpler expressions are possible. My given queries are null-safe.

    Postgres' handling of anonymous records is a bit lacking in corner case situations. Not least because few care, as there are almost always better solutions.

    For functions returning anonymous records, you can append a column definition list. Not applicable here.