Search code examples
sqlgoogle-bigqueryranking-functions

Partition By 2 Columns in BigQuery


Say I want to rank the top 3 months of revenue for each flat and the revenue amount, I can execute the following no problem:

Select * FROM (SELECT Flat,
EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER(PARTITION BY Flat ORDER BY SUM(USD_amt) DESC) AS rank, 
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month)
WHERE rank<=3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month 
ASC

Results

   Flat Year Month Rank Revenue
    1   2019    12  1   3281
    1   2019    4   2   3031
    1   2020    7   3   3019
    1A  2020    11  1   1805
    1A  2020    9   2   1178
    1A  2020    5   3   1166
    2   2020    1   1   3419
    2   2020    7   2   2644
    2   2021    1   3   2460
    3   2019    10  3   2466
    3   2020    6   1   2558
    3   2020    1   2   2530
    4   2020    7   1   0

But now say I want the top 3 months for each year for each of the flats, I thought that I simply need to amend the partition by year as follows:

RANK() OVER(PARTITION BY Flat, EXTRACT(YEAR FROM pay_day) ORDER BY SUM(USD_amt) DESC) AS rank

And I expected the results to be like this:

   Flat Year Month Rank Revenue
    1   2019    12  1   3281
    1   2019    4   2   3031
    1   2019    1   3   2031
    1   2020    4   1   3031
    1   2020    9   2   3001
    1   2020    7   3   2919

But this results in an error "PARTITION BY expression references column pay_day which is neither grouped nor aggregated at [4:50]" I wonder what I am doing wrong?

The table schema is as follows:

Field name            | Type    | Mode
----------------------+---------+------------
Flat                  | STRING  | NULLABLE
pay_day               | DATE    | NULLABLE
nights                | INTEGER | NULLABLE
check_in              | DATE    | NULLABLE
check_out             | DATE    | NULLABLE
nights__in_month_     | STRING  | NULLABLE
nights_outside_month_ | STRING  | NULLABLE
cleaning              | INTEGER | NULLABLE
currency              | STRING  | NULLABLE
USD_amt               | INTEGER | NULLABLE
EANR                  | STRING  | NULLABLE
name                  | STRING  | NULLABLE
people                | INTEGER | NULLABLE
country               | STRING  | NULLABLE
reservation_no_       | STRING  | NULLABLE
payment_processor     | STRING  | NULLABLE
Check_in_day          | STRING  | NULLABLE
Cleaner               | STRING  | NULLABLE
Review                | STRING  | NULLABLE

Solution

  • I think you can reference the column alias in BigQuery:

    SELECT *
    FROM (SELECT Flat, EXTRACT(YEAR FROM pay_day) AS Year,
                 EXTRACT(MONTH FROM pay_day) AS Month,
                 RANK() OVER (PARTITION BY Flat, Year ORDER BY SUM(USD_amt) DESC) AS rank, 
                 SUM(USD_amt) AS Revenue
          FROM `finances.reservations` AS f
          GROUP BY Flat, Year, Month
         ) ym
    WHERE rank <= 3 AND Flat IS NOT NULL
    ORDER BY Flat, Year, Rank, Month ASC
    

    If not, you can use:

    RANK() OVER (PARTITION BY Flat, EXTRACT(YEAR FROM MIN(pay_day))
                 ORDER BY SUM(USD_amt) DESC
                ) AS rank, 
    

    The minimum date in the GROUP BY range has the same year, so this is effectively the same. I realize this is how I usually address this issue.