Search code examples
sqlgoogle-bigquerywindow-functions

Adding grouping in framing clause window while creating partitions


Using the dataset hosted on Google (MBL Data) as an example, here is what I am accomplishing to do - obtain last 3 weeks score run for a given Venue.

My aggregated dataset looks like this without the strikes_3wk column - enter image description here

Logic for strikes_3wk column is to partition the aggregated dataset by venueName, order by YearWeek column and then obtain the last 3 weeks aggregated strikes data.

Here is the query I have written so far. I see that the windowing function is where I need to modify the logic. So, is there a way to add grouping within the windowing function? Is there any alternative way of doing this?

In the image I added a new column 'expected', showing values for two weeks.

select inr.*
       ,sum(inr.strikes) over (Venue_Week rows between current row and 2 following) as strikes_3wk
from
(
    select seasonType
        ,gameStatus
        ,homeTeamName
        ,awayTeamName
        ,venueName
        ,CAST(
        CONCAT(
            CAST(EXTRACT(YEAR FROM createdAt) as string)
            ,CAST(EXTRACT(WEEK(Monday) FROM createdAt) as string)
            ) as INT64)
            as YearWeek
        ,sum(homeFinalRuns) as homeFinalRuns
        ,sum(strikes) as strikes
    from  `bigquery-public-data.baseball.games_wide`
    where   createdAt is not null
    group by seasonType
        ,gameStatus
        ,homeTeamName
        ,awayTeamName
        ,venueName
        ,YearWeek
)inr
window Venue_Week as (
    partition by inr.venueName
    order by inr.YearWeek desc
)

Solution

  • So you are looking for strikes per venue regardless of who did them, right?

    May be something like:

    SELECT INR.*, STATS.strikes_3wk 
    FROM `bigquery-public-data.baseball.games_wide` INR
      LEFT JOIN (
        SELECT venueName, SUM(strikes) as strikes_3wk 
        FROM `bigquery-public-data.baseball.games_wide` INR2
        WHERE YearWeek IN (
          SELECT TOP 3 YearWeek 
          FROM `bigquery-public-data.baseball.games_wide` 
          WHERE venueName = INR2.venueName
          ORDER BY YearWeek DESC
        )
        GROUP BY venueName
      ) STATS 
        ON INR.venueName = STATS.venueName