Search code examples
google-bigquerypercentile

How to see if business was in top/bottom 25th percentile in BigQuery?


I have data that looks like the following:

business_id|business_name|  time_value  | time_bin |yoy_txn_growth
    1111   |     A       |  July 1,2022 |  Monthly |   .43
    2222   |     B       |  July 1,2022 |  Monthly |   .35
    1111   |     A       |  June 1,2022 |  Monthly |   .27
    2222   |     B       |  June 1,2022 |  Monthly |   .25
    1111   |     A       |  May 1,2022  |  Monthly |   .13
    2222   |     B       |  May 1,2022  |  Monthly |   .12
    1111   |     A       |  April 1,2022|  Monthly |   .09
    2222   |     B       |  April 1,2022|  Monthly |   .08
    1111   |     A       |  March 1,2022|  Monthly |   .20
    2222   |     B       |  March 1,2022|  Monthly |   .19
    1111   |     A       |  July 1,2022 | Quarterly|   .49
    2222   |     B       |  July 1,2022 | Quarterly|   .47
    1111   |     A       |  April 1,2022| Quarterly|   .02
    2222   |     B       |  April 1,2022| Quarterly|   .01

I'd like to know if for each time_value + time_bin if a business was in the top 25th percentile of yoy transaction growth and, separately if the business was in the bottom 25th percentile of yoy transaction growth. There may be many more businesses or there may be some cases as few as 1 in the whole table (when it would be true it was both in the top25 and the bottom25 to default).

business_id|business_name|  time_value  | time_bin |yoy_txn_growth| top25perc | bot25perc
    1111   |     A       |  July 1,2022 |  Monthly |   .43        |   TRUE    |   FALSE
    2222   |     B       |  July 1,2022 |  Monthly |   .35        |   FALSE   |   TRUE
    1111   |     A       |  June 1,2022 |  Monthly |   .23        |   TRUE    |   FALSE
    2222   |     B       |  June 1,2022 |  Monthly |   .25        |   FALSE   |   TRUE
    1111   |     A       |  May 1,2022  |  Monthly |   .13        |   TRUE    |   FALSE
    2222   |     B       |  May 1,2022  |  Monthly |   .12        |   FALSE   |   TRUE
    1111   |     A       |  April 1,2022|  Monthly |   .09        |   TRUE    |   FALSE
    2222   |     B       |  April 1,2022|  Monthly |   .08        |   FALSE   |   TRUE
    1111   |     A       |  March 1,2022|  Monthly |   .18        |   TRUE    |   FALSE
    2222   |     B       |  March 1,2022|  Monthly |   .19        |   FALSE   |   TRUE
    1111   |     A       |  July 1,2022 | Quarterly|   .46        |   TRUE    |   FALSE
    2222   |     B       |  July 1,2022 | Quarterly|   .47        |   FALSE   |   TRUE
    1111   |     A       |  April 1,2022| Quarterly|   .02        |   TRUE    |   FALSE
    2222   |     B       |  April 1,2022| Quarterly|   .01        |   FALSE   |   TRUE

How can I accomplish this in BigQuery?


Solution

  • For your requirement, percentile_cont can be used for getting percentiles with if condition. You can try below query to get the expected output.

    Query

    SELECT
     *,
    IF
     (PERCENTILE_CONT(yoy_txn_growth,
         0.75) OVER(PARTITION BY time_bin, time_value) <yoy_txn_growth,
       TRUE,
       FALSE) AS top25perc,
    IF
     (PERCENTILE_CONT(yoy_txn_growth,
         0.25) OVER(PARTITION BY time_bin, time_value)>yoy_txn_growth,
       TRUE,
       FALSE) AS bot25perc
    FROM `project.dataset.Percentiletest`
    ORDER BY time_value,time_bin
    

    Output

    enter image description here