Search code examples
sqlgoogle-bigquery

Aggregate data in array of structs to strings - BigQuery


I have some data that in simplified terms looks something like this:

enter image description here

What I want to do is summarise this into a table where each ID appears once, and the parts are summarise in some form of aggregate string for each column, but importantly that each of these aggregate strings maintains the same order. For example:

enter image description here

For a single column this would be easy enough, but in order to guarantee I maintain the order I presumably need to use an array of structs? In which case I can use some SQL along these lines to aggregate the data:

SELECT id, ARRAY_AGG(STRUCT(part_id AS ids, part_value AS values)) AS part FROM table

At this stage, however, I'm stuck on how I would aggregate each element of the STRUCT separately and turn it back into a 'normal' field.

Edited for clarity: replaced screenshots, as previous example had unique values in part_id and they were already ordered. Neither of these should have been assumed.


Solution

  • The database system may return the results in any order, you need to use the ORDER BY clause to define the sorting based on one or more columns ( Example you can use a unique ID or a date (timestamp) column, or both ):

    SELECT id, STRING_AGG(part_id, '|' ORDER BY ordering_column) as part_ids,
               STRING_AGG(CAST(part_value AS STRING), '|' ORDER BY ordering_column) as part_values
    FROM mytable
    GROUP  id