Search code examples
sqlpostgresqlexcept

Query with EXCEPT before GROUP BY fails


The following query is attempting to exclude one array from the main array, then group results.

SELECT utc_offset, is_dst,
       ltrim(
               trim(string_agg(distinct (CASE WHEN abbrev NOT LIKE '+%' AND abbrev NOT LIKE '-%' AND abbrev != name THEN abbrev ELSE '' END), ' ')) ||
               ' ' || string_agg(name, ', ' ORDER BY name)
       )
FROM pg_timezone_names
WHERE name NOT LIKE 'posix/%'
  AND name NOT LIKE 'Etc/%'
  AND (lower(abbrev) <> abbrev)
  AND name NOT IN ('HST', 'Factory', 'GMT', 'GMT+0', 'GMT-0', 'GMT0', 'localtime', 'UCT', 'Universal', 'UTC', 'PST8PDT', 'ROK', 'W-SU', 'MST', 'CST6CDT')
EXCEPT
SELECT n.*, a.*
FROM   pg_timezone_names n 
JOIN   pg_timezone_abbrevs a ON  a.abbrev = n.name
WHERE  n.utc_offset <> a.utc_offset
GROUP BY utc_offset, is_dst
ORDER BY utc_offset, is_dst
;

is instead generating an error:

ERROR:  column "pg_timezone_names.utc_offset" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT utc_offset, is_dst,

The query works without the EXCEPT block. Clearly there is a syntactic problem - the GROUP BY (and by extension) ORDER BY verbs are not being seen.

How should this query then be cast?


Solution

  • Question

    The stated objective is to ...

    exclude one array from the main array, then group results.

    I think you really want this:

    "Exclude time zones where an abbreviation with identical name but different offset exists - before aggregating qualifying rows."

    What's wrong with the query?

    • You really want to exclude rows before aggregating the string. Just adding parentheses won't fix it.
    • EXCEPT requires compatible SELECT lists left and right.
    • SELECT n.*, a.* is a repeat offender: it also disagrees with the GROUP BY clause - the immediate cause for the observed error message.

    Proof of concept

    You attempt would work like this:

    SELECT utc_offset, is_dst,
           ltrim(
                   trim(string_agg(distinct (CASE WHEN abbrev NOT LIKE '+%' AND abbrev NOT LIKE '-%' AND abbrev != name THEN abbrev ELSE '' END), ' ')) ||
                   ' ' || string_agg(name, ', ' ORDER BY name)
           )
    FROM  (
       SELECT utc_offset, is_dst, name, abbrev
       FROM   pg_timezone_names
       WHERE  name NOT LIKE 'posix/%'
       AND    name NOT LIKE 'Etc/%'
       AND    (lower(abbrev) <> abbrev)
       AND    name NOT IN ('HST', 'Factory', 'GMT', 'GMT+0', 'GMT-0', 'GMT0', 'localtime', 'UCT', 'Universal', 'UTC', 'PST8PDT', 'ROK', 'W-SU', 'MST', 'CST6CDT')
       EXCEPT
       SELECT n.utc_offset, n.is_dst, n.name, n.abbrev
       FROM   pg_timezone_names n 
       JOIN   pg_timezone_abbrevs a ON  a.abbrev = n.name
       WHERE  n.utc_offset <> a.utc_offset
       ) sub
    GROUP BY utc_offset, is_dst
    ORDER BY utc_offset, is_dst
    ;
    

    But don't. Rather use this 100 % equivalent ...

    Superior query

    SELECT utc_offset, is_dst
         , concat_ws(' ', string_agg(DISTINCT abbrev, ' ' ORDER BY abbrev) FILTER (WHERE NOT abbrev ^@ ANY ('{+,-}') AND abbrev <> name)
                        , string_agg(name, ', ' ORDER BY name)) AS abbrevs_and_names
    FROM   pg_timezone_names n
    WHERE  lower(abbrev) <> abbrev
    AND    NOT name ^@ ANY ('{posix/, Etc/}')
    AND    name <> ALL ('{HST, Factory, GMT, GMT+0, GMT-0, GMT0, localtime, UCT, Universal, UTC, PST8PDT, ROK, W-SU, MST, CST6CDT}')
    AND    NOT EXISTS (  -- !!!
       SELECT FROM pg_timezone_abbrevs a
       WHERE  a.abbrev = n.name
       AND    a.utc_offset <> n.utc_offset
       )
    GROUP  BY utc_offset, is_dst
    ORDER  BY utc_offset, is_dst
    ;
    

    fiddle

    NOT EXISTS is much easier than EXCEPT here. (It is most of the time.) See:

    I threw in a couple other (optional) simplifications & optimizations:

    About the "starts with" operator ^@ (requires at least Postgres 11):

    About the ANY construct:

    About null-safe concatenation with concat_ws():

    About the aggregate FILTER clause: