Search code examples
sqlgroup-bygoogle-bigquerysql-order-by

Ordering within ARRAY_AGG after GROUP BY in BigQuery


I have a BigQuery table:

create or replace table `project.table.mock` as (
  select 1 as col0, 'a' as col1, 'x' as col2
    union all
  select 2 as col0, 'a' as col1, 'y' as col2
    union all
  select 4 as col0, 'b' as col1, 'z' as col2
    union all
  select 8 as col0, 'b' as col1, 'X' as col2
    union all
  select 7 as col0, 'b' as col1, 'Y' as col2
)

Visualization:

enter image description here

I would like to group by column col1, and array_agg the results from col2. I would like to have the elements appearing in each array to be sorted by col0.

I am now at:

select array_agg(col2) as col1arrays from `project.table.mock` group by col1;

which gives me:

enter image description here

The desired output in the second row would be [z, Y, X] (as the row where z appears in col2 has 4 in col0, the row where Y appears in col2 has 7 in col0 and the row where X appears in col2 has 8 in col0, and 4 < 7 < 8.

How can I achieve ordering within array_agg, as described above, in BigQuery?


Solution

  • You can add ORDER BY clause in ARRAY_AGG() function.

    SELECT ARRAY_AGG(col2 ORDER BY col1 ASC) AS col1arrays 
      FROM `project.table.mock`  
     GROUP BY col1;
    
    WITH mock as (
      select 1 as col0, 'a' as col1, 'x' as col2
        union all
      select 2 as col0, 'a' as col1, 'y' as col2
        union all
      select 4 as col0, 'b' as col1, 'z' as col2
        union all
      select 8 as col0, 'b' as col1, 'X' as col2
        union all
      select 7 as col0, 'b' as col1, 'Y' as col2
    )
    select array_agg(col2 ORDER BY col0) as col1arrays from mock group by col1;
    
    output:
    +------------+
    | col1arrays |
    +------------+
    | [x,y]      |
    | [z,Y,X]    |
    +------------+