Search code examples
sqlgoogle-bigqueryaggregate-functionsdate-arithmetic

aggregation of aggregation not allowed


I am using google bigquery.

I want to find the average number of events for each day of the week (Sunday,monday,tuesday,wednesday,thursday,friday,saturday) for a certain time period (1 to 28 May).

For example there were 4 Sundays between May 1 to May 28, I want to find out average number of events for Sunday based on those 4 Sundays.

Here is my code:

SELECT EXTRACT(DAYOFWEEK FROM TIMESTAMP_MICROS(event_timestamp)) AS day,
       AVG(COUNT(*)) AS average_events
FROM `table`
WHERE DATE_TRUNC(EXTRACT(DATE FROM TIMESTAMP_MICROS(event_timestamp)), DAY)
      BETWEEN DATETIME('2023-05-01 00:00:00') AND DATETIME('2023-05-28 00:00:00')
GROUP BY day
ORDER BY day

this gives an error like "Aggregations of aggregations are not allowed"

edit:

I have tried the following -

SELECT day,
       AVG(counts) AS average_events
FROM (
   SELECT EXTRACT(DAYOFWEEK FROM event_timestamp) AS day,
          COUNT(*) AS counts
   FROM `table`
   WHERE DATE_TRUNC(event_timestamp, DAY)
         BETWEEN DATETIME('2023-05-01 00:00:00') AND DATETIME('2023-05-28 00:00:00')
   GROUP BY day
) AS t
GROUP BY day
ORDER BY day

but here, the count is the same as the average.


Solution

  • You want to aggregate by day first, and get each associated count - then by day of the week.

    I would recommend:

    select event_dow, avg(cnt) avg_cnt
    from (
        select 
            timestamp_trunc(timestamp_micros(event_timestamp), day) event_day
            extract(dayofweek from timestamp_micros(event_timestamp)) event_dow, 
            count(*) cnt
        from mytable
        where event_timestamp >= unix_micros(timestamp '2023-05-01')
          and event_timestamp <  unix_micros(timestamp '2023-05-29')
        group by event_day, event_dow
    ) t
    group by event_dow                                                           -- dow aggregation
    order by event_dow
    

    Note that I attempted to optimize the where clause of the query by filtering directly against original values (we can convert the bounds to unix timestamps instead) - and using half-open intervals.