Say I want to rank the top 3 months of revenue for each flat and the revenue amount, I can execute the following no problem:
Select * FROM (SELECT Flat,
EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER(PARTITION BY Flat ORDER BY SUM(USD_amt) DESC) AS rank,
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month)
WHERE rank<=3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month
ASC
Results
Flat Year Month Rank Revenue
1 2019 12 1 3281
1 2019 4 2 3031
1 2020 7 3 3019
1A 2020 11 1 1805
1A 2020 9 2 1178
1A 2020 5 3 1166
2 2020 1 1 3419
2 2020 7 2 2644
2 2021 1 3 2460
3 2019 10 3 2466
3 2020 6 1 2558
3 2020 1 2 2530
4 2020 7 1 0
But now say I want the top 3 months for each year for each of the flats, I thought that I simply need to amend the partition by year as follows:
RANK() OVER(PARTITION BY Flat, EXTRACT(YEAR FROM pay_day) ORDER BY SUM(USD_amt) DESC) AS rank
And I expected the results to be like this:
Flat Year Month Rank Revenue
1 2019 12 1 3281
1 2019 4 2 3031
1 2019 1 3 2031
1 2020 4 1 3031
1 2020 9 2 3001
1 2020 7 3 2919
But this results in an error "PARTITION BY expression references column pay_day which is neither grouped nor aggregated at [4:50]" I wonder what I am doing wrong?
The table schema is as follows:
Field name | Type | Mode
----------------------+---------+------------
Flat | STRING | NULLABLE
pay_day | DATE | NULLABLE
nights | INTEGER | NULLABLE
check_in | DATE | NULLABLE
check_out | DATE | NULLABLE
nights__in_month_ | STRING | NULLABLE
nights_outside_month_ | STRING | NULLABLE
cleaning | INTEGER | NULLABLE
currency | STRING | NULLABLE
USD_amt | INTEGER | NULLABLE
EANR | STRING | NULLABLE
name | STRING | NULLABLE
people | INTEGER | NULLABLE
country | STRING | NULLABLE
reservation_no_ | STRING | NULLABLE
payment_processor | STRING | NULLABLE
Check_in_day | STRING | NULLABLE
Cleaner | STRING | NULLABLE
Review | STRING | NULLABLE
I think you can reference the column alias in BigQuery:
SELECT *
FROM (SELECT Flat, EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER (PARTITION BY Flat, Year ORDER BY SUM(USD_amt) DESC) AS rank,
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month
) ym
WHERE rank <= 3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month ASC
If not, you can use:
RANK() OVER (PARTITION BY Flat, EXTRACT(YEAR FROM MIN(pay_day))
ORDER BY SUM(USD_amt) DESC
) AS rank,
The minimum date in the GROUP BY
range has the same year, so this is effectively the same. I realize this is how I usually address this issue.