Snowflake allows to concatenate string across multiple rows by using aggregate/windowed LISTAGG function. Sometimes we just need first few occurrences per group. Similar to GROUP_CONCAT with LIMIT
:
SELECT grp, COUNT(*) AS cnt,
GROUP_CONCAT(col ORDER BY col SEPARATOR ',' LIMIT 3) AS first_3
FROM tab
GROUP BY grp;
Example:
CREATE OR REPLACE TABLE tab(grp TEXT, col TEXT) AS
SELECT * FROM VALUES
('Grp1', 'A'),('Grp1', 'B'),('Grp1', 'C'),('Grp1', 'D'), ('Grp1', 'E'),
('Grp2', 'X'),('Grp2', 'Y'),('Grp2', 'Z'),('Grp2', 'V'),
('Grp3', 'M'),('Grp3', 'N'),('Grp3', 'O');
For provided sample input the expected result is:
grp | cnt | first_3 |
---|---|---|
Grp1 | 5 | A,B,C |
Grp2 | 4 | X,Y,Z |
Grp3 | 3 | M,N,0 |
The idea is to find simple way of controlling number of rows considered for LISTAGG
without affecting other aggregate functions and using subqueries like:
SELECT grp,
COUNT(*) AS cnt,
LISTAGG(CASE WHEN rn <= 3 THEN col END, ',') WITHIN GROUP(ORDER BY col) AS first_3
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY col) AS rn
FROM tab)
GROUP BY grp;
The alternative to aggregate string concatenation is creation of array and transforming it to string afterwards. MIN_BY/MAX_BY allows to choose number of rows for returned array.
SELECT grp, COUNT(*) AS cnt, ARRAY_TO_STRING(MIN_BY(col, col, 3), ',') AS first_3
FROM tab
GROUP BY grp;
Output:
Here: ARRAY_TO_STRING
and MIN_BY
is effectively an equivalent of LISTAGG
with LIMIT
.
It also works well for windowed scenario, if all rows are required:
SELECT grp,
COUNT(*) OVER(PARTITION BY grp) AS cnt,
ARRAY_TO_STRING(MIN_BY(col, col, 3) OVER(PARTITION BY grp), ',') AS first_3
FROM tab
ORDER BY grp;
If LISTAGG(DISTINCT ...)
is expected then ARRAY_DISTINCT
can be added:
SELECT grp, COUNT(*) AS cnt,
ARRAY_TO_STRING(ARRAY_DISTINCT(MIN_BY(col, col, 3)), ',') AS first_3_distinct
FROM tab
GROUP BY grp;