I have data that looks like the following:
business_id|business_name| time_value | time_bin |yoy_txn_growth
1111 | A | July 1,2022 | Monthly | .43
2222 | B | July 1,2022 | Monthly | .35
1111 | A | June 1,2022 | Monthly | .27
2222 | B | June 1,2022 | Monthly | .25
1111 | A | May 1,2022 | Monthly | .13
2222 | B | May 1,2022 | Monthly | .12
1111 | A | April 1,2022| Monthly | .09
2222 | B | April 1,2022| Monthly | .08
1111 | A | March 1,2022| Monthly | .20
2222 | B | March 1,2022| Monthly | .19
1111 | A | July 1,2022 | Quarterly| .49
2222 | B | July 1,2022 | Quarterly| .47
1111 | A | April 1,2022| Quarterly| .02
2222 | B | April 1,2022| Quarterly| .01
I'd like to know if for each time_value + time_bin if a business was in the top 25th percentile of yoy transaction growth and, separately if the business was in the bottom 25th percentile of yoy transaction growth. There may be many more businesses or there may be some cases as few as 1 in the whole table (when it would be true it was both in the top25 and the bottom25 to default).
business_id|business_name| time_value | time_bin |yoy_txn_growth| top25perc | bot25perc
1111 | A | July 1,2022 | Monthly | .43 | TRUE | FALSE
2222 | B | July 1,2022 | Monthly | .35 | FALSE | TRUE
1111 | A | June 1,2022 | Monthly | .23 | TRUE | FALSE
2222 | B | June 1,2022 | Monthly | .25 | FALSE | TRUE
1111 | A | May 1,2022 | Monthly | .13 | TRUE | FALSE
2222 | B | May 1,2022 | Monthly | .12 | FALSE | TRUE
1111 | A | April 1,2022| Monthly | .09 | TRUE | FALSE
2222 | B | April 1,2022| Monthly | .08 | FALSE | TRUE
1111 | A | March 1,2022| Monthly | .18 | TRUE | FALSE
2222 | B | March 1,2022| Monthly | .19 | FALSE | TRUE
1111 | A | July 1,2022 | Quarterly| .46 | TRUE | FALSE
2222 | B | July 1,2022 | Quarterly| .47 | FALSE | TRUE
1111 | A | April 1,2022| Quarterly| .02 | TRUE | FALSE
2222 | B | April 1,2022| Quarterly| .01 | FALSE | TRUE
How can I accomplish this in BigQuery?
For your requirement, percentile_cont can be used for getting percentiles with if condition. You can try below query to get the expected output.
Query
SELECT
*,
IF
(PERCENTILE_CONT(yoy_txn_growth,
0.75) OVER(PARTITION BY time_bin, time_value) <yoy_txn_growth,
TRUE,
FALSE) AS top25perc,
IF
(PERCENTILE_CONT(yoy_txn_growth,
0.25) OVER(PARTITION BY time_bin, time_value)>yoy_txn_growth,
TRUE,
FALSE) AS bot25perc
FROM `project.dataset.Percentiletest`
ORDER BY time_value,time_bin
Output