Search code examples
pythonpandasdataframegroup-by

How to check in which bin an entry of a pandas data frame group belongs?


I have a following problem. I have a pandas data frame, eg something like this:

example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                        'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
                        'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
            'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
                 11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}

df = pd.DataFrame(example)

What I want to build, is a column quantile_bin, which has integer values from 1 to 10, which tells me, to which quantile the specific number in wins for a specific Player belongs, where 1 is between 0 and 0.1 quantile, 2 is between 0.1 and 0.2 quantile, ..., and 10 is between 0.9 and 1.0 quantile.

In other words

df.groupby("Player")["wins"].apply(magic_quantile_function)

The result should be something like

>>> df
   Player  wins  quantile_bin
0       A     2             1
1       A     4             2
2       A     4             2
3       A     5             5
4       A     6             6
5       A     9             8
6       A    13            10
7       A    13            10
8       B    15            10
9       B    15            10
10      B    14             9
11      B    13             6
12      B    11             5
13      B     9             2
14      B     9             2
15      B     8             1
16      C     8             1
17      C    16             3
18      C    19             7
19      C    21            10
20      C    14             2
21      C    20             9
22      C    19             7
23      C    18             5

I would be grateful if one could help me defining this magic_quantile_function, or at least giving me a hint on how to pass a grouped wins distribution to a user defined function which is applied to a groupby object.

Thank you in advance!


Solution

  • Seems like you could use qcut

    df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s, 
                                                                              q=10, 
                                                                              duplicates='drop',
                                                                              labels=range(2, 11)))
    

       Player  wins quantile
    0       A     2        2
    1       A     4        3
    2       A     4        3
    3       A     5        6
    4       A     6        7
    5       A     9        9
    6       A    13       10
    7       A    13       10
    8       B    15       10
    9       B    15       10
    10      B    14        9
    11      B    13        7
    12      B    11        6
    13      B     9        3
    14      B     9        3
    15      B     8        2
    16      C     8        2
    17      C    16        4
    18      C    19        7
    19      C    21       10
    20      C    14        3
    21      C    20        9
    22      C    19        7
    23      C    18        6