Search code examples
sqlgoogle-bigqueryaggregate-functions

MAX() Partition by is rising Query Processing Size - SQL in GBQ


I have the following source data in SQL in GBQ:

    SELECT 1 AS client_id,  CAST('2022-10-13' AS DATE) AS session_date, 'denied' AS value,
UNION ALL
SELECT 1,  CAST('2022-10-15' AS DATE), 'granted'
UNION ALL
SELECT 1, CAST('2022-10-18' AS DATE), 'denied'
UNION ALL
SELECT 2, CAST('2022-01-01' AS DATE), 'denied'
UNION ALL
SELECT 2, CAST('2022-01-05' AS DATE), 'granted'
UNION ALL
SELECT 3, CAST('2022-01-01' AS DATE), 'granted'
UNION ALL
SELECT 4, CAST('2022-01-03' AS DATE), 'granted'

And I want to get a table like this where I have the last activity date for each user with an struct that saves every activity session date and it's value (denied/granted):

enter image description here

I have used the following query to achieve this:

WITH max_date AS (
      SELECT
        client_id,
        session_date,
        value,
        MAX(session_date) OVER (PARTITION BY client_id) AS last_activity
      FROM
        source_data )
    SELECT
      client_id,
      last_activity,
      ARRAY_AGG(STRUCT(session_date,
          value)
      ORDER BY
        session_date) AS push_permission
    FROM
      max_date
    GROUP BY
      1,
      2

The issue is that the query size is going from 11GB to 1.2TB. I don't understand why this is happening, as I have used the max() partition by over other times and this wasn't happening. Is there any way to fix this? It seems it's missing the partition for some reason.

Thanks!


Solution

  • I don't think that you need window functions here, aggregation is sufficient; the last activity date can be computed with a simple aggregate max().

    Is the query less resource-consuming if we phrase it like this?

    select client_id, max(session_date) as last_activity
        array_agg(struct(session_date, value) order by session_date) as push_permission
    from source_data
    group by client_id
    

    Note that the query still needs to scan and aggregate the whole table - given your requirement, this cannot be avoided.