Search code examples
google-bigqueryranking

How to use Rank() in BigQuery and restart Rank counting by timeframe?


In BigQuery, I'm wanting to create a ranked list by SKU sales per Season, however I'm getting a bit lost with how to do this with the current state of my code. My desired output is something like:

Season      | SKU         | total_spent | Rank 
Spring 2020 | SKU_sample1 | $200 | 1
Spring 2020 | SKU_sample2 | $190 | 2
Spring 2020 | SKU_sample3 | $185 | 3

--- so forth, then restart the ranking when the Season changes

Season      | SKU         | total_spent | Rank 
Halloween 2020 | SKU_sample1 | $500 | 1
Halloween 2020 | SKU_sample2 | $400 | 2
Halloween 2020 | SKU_sample3 | $300 | 3

My base code is this:

SELECT DATE(sales_time) as sales_time,
    CASE
        WHEN DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23' THEN 'Spring 2020'
        WHEN DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02' THEN 'Halloween 2020'
        WHEN DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03' THEN 'Thanksgiving 2020'
        WHEN DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04' THEN 'Xmas 2020'
      ELSE
      'unknown_season'
    END
      AS season,
    sku,
    SUM(salesPrice) as total_spent
    FROM sales_table
    WHERE
       DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23' 
        OR (DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02')
        OR (DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03')
        OR (DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04')
    GROUP BY  sku,
    DATE(sales_time),
    salesPrice,
    season

Solution

  • I've broken it down a bit using CTEs to encapsulate the ranking logic:

    with data as (
        select
            sku,
            case
                when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
                when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
                when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
                when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
                else 'unknown_season'
            end as season,
            sum(salesPrice) as total_spent
        from `project.dataset.sales_table`
        where
            (
                date(sales_time) between '2020-04-09' and '2020-04-23' or
                date(sales_time) between '2020-10-29' and '2020-11-02' or
                date(sales_time) between '2020-11-25' and '2020-12-03' or
                date(sales_time) between '2020-12-17' and '2021-01-04'
            )
            group by 1,2
    ),
    ranked as (
        select 
            season, 
            sku, 
            total_spent,
            -- Within each season, rank by total_spent -- could also use row_number() if you want to break ties
            rank() over(partition by season order by total_spent desc) as spend_rank
        from data
    )
    select * from ranked
    order by season, spend_rank asc