Combining and aggregating key/value pairs across different events/rows

I have a table consisting of customer events. These events have arrays containing keys and values.

The following simulated table illustrates what these events could look like:

user	event_date	event_timestamp	store	city	purchases.key	purchases.int_value	purchases.float_value	purchases.string_value
XXX	2025-01-01	1735693200	store_1	london	type			online
					ware			banana
					amount	2
					price		0.5
XXX	2025-01-01	1735693201	store_1	london	type			online
					ware			apple
					amount	3
					price		0.6
XXX	2025-01-01	1735693202	store_1	london	type			online
					ware			banana
					amount	3
					price		0.5
XXX	2025-01-02	1735693203	store_1	london	type			online
					ware			bread
					amount	1
					price		1.0
YYY	2025-01-01	1735693204	store_2	sydney	type			physical
					ware			milk
					amount	1
					price		1.5

WITH
  CTE_TEST_DATA AS (
    SELECT
      'XXX'     AS user
      ,'2025-01-01' AS event_date
      ,'store_1'    AS store
      ,'london'     AS city
      ,[
          STRUCT('type' AS key, NULL AS int_value, NULL as float_value, 'online' AS string_value),
          STRUCT('ware', NULL, NULL, 'banana'),
          STRUCT('amount', 2, NULL, NULL),
          STRUCT('price', NULL, 0.5, NULL)
        ] AS purchases
    UNION ALL
    SELECT
      'XXX'     AS user
      ,'2025-01-01' AS event_date
      ,'store_1'    AS store
      ,'london'     AS city
      ,[
          STRUCT('type', NULL, NULL, 'online'),
          STRUCT('ware', NULL, NULL, 'apple'),
          STRUCT('amount', 3, NULL, NULL),
          STRUCT('price', NULL, 0.6, NULL)
        ] AS purchases
    UNION ALL
    SELECT
      'XXX'     AS user
      ,'2025-01-01' AS event_date
      ,'store_1'    AS store
      ,'london'     AS city
      ,[
          STRUCT('type', NULL, NULL, 'online'),
          STRUCT('ware', NULL, NULL, 'banana'),
          STRUCT('amount', 3, NULL, NULL),
          STRUCT('price', NULL, 0.5, NULL)
        ] AS purchases
    UNION ALL
    SELECT
      'XXX'     AS user
      ,'2025-01-02' AS event_date
      ,'store_1'    AS store
      ,'london'     AS city
      ,[
          STRUCT('type', NULL, NULL, 'online'),
          STRUCT('ware', NULL, NULL, 'bread'),
          STRUCT('amount', 1, NULL, NULL),
          STRUCT('price', NULL, 1.0, NULL)
        ] AS purchases
    UNION ALL
    SELECT
      'YYY'     AS user
      ,'2025-01-01' AS event_date
      ,'store_2'    AS store
      ,'sydney'     AS city
      ,[
          STRUCT('type', NULL, NULL, 'physical'),
          STRUCT('ware', NULL, NULL, 'milk'),
          STRUCT('amount', 1, NULL, NULL),
          STRUCT('price', NULL, 1.5, NULL)
        ] AS purchases
    )

The actual Customer Events table is quite big, hence a daily aggregation query has been created to make analysis and reporting less costly.

This daily aggregation query yields aggregated results per user per event_date:

SELECT
   CTD.user
  ,CTD.event_date
  ,STRUCT(COUNT(*) as amount) AS events_per_user
  ,*𝑜𝑡ℎ𝑒𝑟 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒𝑠*
FROM
  CTE_TEST_DATA  AS CTD
GROUP BY
   CTD.user
  ,CTD.event_date

When new metrics are added to the Customer Events table, the daily aggregation query must be modified to include these.

Because of this setup, I have a few constraints when adding aggregated columns, the main ones being that new metrics must be added exclusively using correlated subqueries, and that Common Table Expressions cannot be used.

The result I am trying to achieve is the following, where the array contains only distinct ware names:

user	event_date	events_per_user.amount	ware.name	ware.distinct_orders	total.amount
xxx	2025-01-01	3	banana	2	5
			apple	1	3
yyy	2025-01-01	1	milk	1	1
xxx	2025-01-02	1	bread	1	1

Unfortunately, the closest I have been to achieve that result, is using the following query:

SELECT
   CTD.user
  ,CTD.event_date
  ,STRUCT(COUNT(*) as amount)   AS events_per_user
  ,ARRAY_AGG((
    SELECT AS STRUCT
       P.string_value           AS name
      ,COUNT(P.string_value)    AS distinct_orders
    FROM
      UNNEST(CTD.purchases) AS P
    WHERE 1=1
      AND P.key = 'ware'
    GROUP BY
      P.string_value
  )) AS ware
  ,ARRAY_AGG((
    SELECT AS STRUCT
      SUM(P.int_value)          AS amount
    FROM
      UNNEST(CTD.purchases) AS P
    WHERE 1=1
      AND P.key = 'amount'
    GROUP BY
       user
      ,event_date
  )) AS total
FROM
  CTE_TEST_DATA AS CTD
GROUP BY
   CTD.user
  ,CTD.event_date

which results in this table:

user	event_date	events_per_user.amount	ware.name	ware.distinct_orders	total.amount
xxx	2025-01-01	3	banana	1	2
			apple	1	3
			banana	1	3
yyy	2025-01-01	1	milk	1	1
xxx	2025-01-02	1	bread	1	1

I have spent hours researching and trying different methods to achieve this result, unfortunately without success, and I am left wondering if it is even possible.

Any help would be greatly appreciated!

Solution

See example.
There ware_totalSum as totalSum - new metric added to your example.

  -- aggregate by user,event_date and array of data values
 select user,event_date,sum(events_per_user_ware)as events_per_user
     ,array_agg(
         struct (wareV as ware,ware_distinct_orders as ware_distinct_orders,amount as amount,totalSum as totalSum)
     ) wd -- ware_data
  from ( -- aggregate by user,event_date,wareV
     select user,event_date,wareV
      ,count(*) events_per_user_ware
      ,count(*) ware_distinct_orders
      ,sum(amountV) as amount
      ,sum(priceV*amountV) totalSum
    from (  -- extract data values
      select user,event_date,store,city 
        ,purchases[0].string_value as typeV
        ,purchases[1].string_value as wareV
        ,purchases[2].int_value as amountV
        ,purchases[3].float_value as priceV
      from CTE_TEST_DATA
    )raw_data
    group by user,event_date,wareV
  ) agg_data
  group by user,event_date

Row	user	event_date	events_per_user	wd.ware	wd.ware_distinct_orders	wd.amount	wd.totalSum
1	XXX	2025-01-01	3	banana	2	5	2.5
				apple	1	3	1.7999999999999998
2	XXX	2025-01-02	1	bread	1	1	1.0
3	YYY	2025-01-01	1	milk	1	1	1.5

Update1

Your restriction on using CTE and on pre-grouping data in general looks unclear. Of course, it is possible to fulfill these requirements. Due to the significant impact on performance.

  select user,event_date
       ,count(*) as events_per_user,sum(purchases[2].int_value) as amount_per_user 
     --, sum(purchases[2].int_value*purchases[3].float_value)as totalSum_per_user
       ,(select 
           array_agg(
             struct (wareV as ware,ware_distinct_orders as ware_distinct_orders
                      ,amount as amount,totalSum as totalSum) )
         from (
            select user,event_date,purchases[1].string_value as wareV
              ,count(*) as ware_distinct_orders,sum(purchases[2].int_value) as amount
              ,sum(purchases[2].int_value*purchases[3].float_value) as totalSum
            from CTE_TEST_DATA 
            group by user,event_date,wareV
           ) rw
         where rw.user=t.user and rw.event_date=t.event_date
         group by user,event_date
        )as wd
  from  CTE_TEST_DATA t 
  group by user,event_date;