Search code examples
sqlgoogle-bigqueryaggregate-functions

Column neither grouped nor aggregated after introducing window query


I have trouble integrating a simple window function into my query. I work with this avocado dataset from Kaggle. I started off with a simple query:

SELECT
  date,
  SUM(Total_Bags) as weekly_bags,
FROM 
  `course.avocado`
WHERE
  EXTRACT(year FROM date) = 2015
GROUP BY
  date
ORDER BY
  date

And it works just fine. Next, I want to add the rolling sum to the query to display along the weekly sum. I tried the following:

SELECT
  date,
  SUM(Total_Bags) as weekly_bags,
  SUM(Total_Bags) OVER(
    PARTITION BY date
    ORDER BY date
    ROWS BETWEEN 4 PRECEDING AND CURRENT ROW 
  )
FROM 
  `course.avocado`
WHERE
  EXTRACT(year FROM date) = 2015
GROUP BY
  date
ORDER BY
  date

but im getting the common error:

SELECT list expression references column Total_Bags which is neither grouped nor aggregated at [4:7]

and im confused. Total_Bags in the first query was aggregated yet when it's introduced again in the second query, it's not aggregated anymore. How do I fix this query? Thanks.


Solution

  • In your query, which returns 2 columns: date and aggregate SUM(Total_Bags), the window function SUM() is evaluated after the aggregation when there is no column Total_Bags and this is why you can't use it inside the window function.

    However, you can do want you want, without group by, by using only window functions and DISTINCT:

    SELECT DISTINCT date,
           SUM(Total_Bags) OVER(PARTITION BY date) AS weekly_bags,
           SUM(Total_Bags) OVER(
             PARTITION BY date
             ORDER BY date
             ROWS BETWEEN 4 PRECEDING AND CURRENT ROW 
           )
    FROM course.avocado
    WHERE EXTRACT(year FROM date) = 2015
    ORDER BY date;
    

    or, use window function on the the aggregated result:

    SELECT date,
           SUM(Total_Bags) AS weekly_bags,
           SUM(SUM(Total_Bags)) OVER(
             ORDER BY date
             ROWS BETWEEN 4 PRECEDING AND CURRENT ROW 
           )
    FROM course.avocado
    WHERE EXTRACT(year FROM date) = 2015
    GROUP BY date
    ORDER BY date;