Search code examples
snowflake-cloud-data-platformstring-aggregation

Snowflake - LISTAGG with LIMIT


Snowflake allows to concatenate string across multiple rows by using aggregate/windowed LISTAGG function. Sometimes we just need first few occurrences per group. Similar to GROUP_CONCAT with LIMIT:

SELECT grp, COUNT(*) AS cnt,
       GROUP_CONCAT(col ORDER BY col SEPARATOR ',' LIMIT 3) AS first_3 
FROM tab
GROUP BY grp;

db<>fiddle demo

Example:

CREATE OR REPLACE TABLE tab(grp TEXT, col TEXT) AS
SELECT * FROM VALUES 
  ('Grp1', 'A'),('Grp1', 'B'),('Grp1', 'C'),('Grp1', 'D'), ('Grp1', 'E'),
  ('Grp2', 'X'),('Grp2', 'Y'),('Grp2', 'Z'),('Grp2', 'V'),
  ('Grp3', 'M'),('Grp3', 'N'),('Grp3', 'O');

For provided sample input the expected result is:

grp cnt first_3
Grp1 5 A,B,C
Grp2 4 X,Y,Z
Grp3 3 M,N,0

The idea is to find simple way of controlling number of rows considered for LISTAGG without affecting other aggregate functions and using subqueries like:

SELECT grp, 
  COUNT(*) AS cnt,
  LISTAGG(CASE WHEN rn <= 3 THEN col END, ',') WITHIN GROUP(ORDER BY col) AS first_3
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY col) AS rn
      FROM tab)
GROUP BY grp;

Solution

  • The alternative to aggregate string concatenation is creation of array and transforming it to string afterwards. MIN_BY/MAX_BY allows to choose number of rows for returned array.

    SELECT grp, COUNT(*) AS cnt, ARRAY_TO_STRING(MIN_BY(col, col, 3), ',') AS first_3
    FROM tab
    GROUP BY grp;
    

    Output:

    enter image description here

    Here: ARRAY_TO_STRING and MIN_BY is effectively an equivalent of LISTAGG with LIMIT.


    It also works well for windowed scenario, if all rows are required:

    SELECT grp, 
      COUNT(*) OVER(PARTITION BY grp) AS cnt, 
      ARRAY_TO_STRING(MIN_BY(col, col, 3) OVER(PARTITION BY grp), ',') AS first_3
    FROM tab
    ORDER BY grp;
    

    enter image description here

    If LISTAGG(DISTINCT ...) is expected then ARRAY_DISTINCT can be added:

    SELECT grp, COUNT(*) AS cnt,
           ARRAY_TO_STRING(ARRAY_DISTINCT(MIN_BY(col, col, 3)), ',') AS first_3_distinct
    FROM tab
    GROUP BY grp;