Moving to Redshift from another provider, I see that GROUP BY ALL
(which groups by all non-aggregate columns in the SELECT statement) syntax is not supported. This was super useful for me when I had loads of non-aggregate columns and didn't want to manually type out GROUP BY 1,2,3,4...
etc.
I was wondering if there is in fact a similar command on Redshift or is it simply a limitation of PostgreSQL?
There is a group by
all
in both Redshift and Postgres but that's not what it does. You might be interested in group by
cube
, rollup
or grouping sets
. Postgres has them, Redshift has them too.
Unfortunately, there's no syntax for grouping by everything except aggregate expressions detected in the select list. There kind of is, but that's not entirely equivalent, and not as handy - see example at the end.
The group by
all
is default behaviour in both PostgreSQL and Redshift, the alternative being group by
distinct
which removes duplicate grouping sets. Quoting the doc:
When specifying multiple grouping items together, the final set of grouping sets might contain duplicates. For example:
GROUP BY ROLLUP (a, b), ROLLUP (a, c)
is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, b), (a, c), (a), (a), (a, c), (a), () )
If these duplicates are undesirable, they can be removed using the
DISTINCT
clause directly on theGROUP BY
. Therefore:GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c)
is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, c), (a), () )
It's similar to how select
is actually select all
unless you make it select distinct
. Again, the doc:
If
SELECT DISTINCT
is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates).SELECT ALL
specifies the opposite: all rows are kept; that is the default.
If there's a join, you can use a group by tablename.*
to make all columns of the table your grouping set but
join
conditions. The exception is if all their names match and you're willing to risk a natural join
.grouping()
function, but it'll just show that all rows belong to the 0th grouping set.select sum(test_vals.v)
from test
natural join test_vals
group by all test.*;
sum |
---|
774.07 |
2114.39 |
806.75 |
966.23 |
1644.51 |
The scenario I think @Lajos Arpad is referring to is this:
demo2 at db-fiddle
select test.*,sum(v)
from test
join test_vals using(a)
group by test.a;
Error: column "test.b" must appear in the GROUP BY clause or be used in an aggregate function
Suggested addition of b
to the group by
list makes that go away only to be replaced by a request to do something about c
and all following columns. That's no longer the case if a
is a primary key
- you can group by only the PK, skipping the rest:
alter table test add primary key(a);
select test.*,sum(v)
from test
join test_vals using(a)
group by test.a;
a | b | c | sum |
---|---|---|---|
1 | 7 | 3 | 4.42 |
2 | 8 | 1 | 3.50 |
3 | 1 | 2 | 5.18 |
You can find this, once more, in the doc:
If the products table is set up so that, say,
product_id
is the primary key, then it would be enough to group byproduct_id
in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.