Search code examples
sqlgroup-bycountgoogle-bigquerywindow-functions

Grouping by percentiles in bigquery


This might be a bit too easy question to ask here, but I can't seem to figure it out and started banging my head on the wall.

I want to assign my users scores from 1-5, the top 20% of users (say number of logins) receive a top score of 5, the next 20% a score of 4, and so on. So from the inital table; table1

SELECT userid, login -- APPROX_QUANTILES(x, 2) AS approx_quantiles
FROM UNNEST([01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) AS userid,
     UNNEST([ 1, 12, 14,  3,  5, 75, 56,  7,  8,  9, 10,  1, 12, 14, 14,  5, 23, 56,  7, 10]) AS login;

it will look like table 2;

enter image description here


Solution

  • If I understand correctly, you want ntile():

    select userid, login, ntile(5) over(order by login) nt
    from mytable
    

    Depending on the dataset you are starting with, you might need to aggegate by user first:

    select userid, count(*) login, ntile(5) over(order by count(*)) nt
    from mytable
    group by userid