I have a following problem. I have a pandas data frame, eg something like this:
example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}
df = pd.DataFrame(example)
What I want to build, is a column quantile_bin
, which has integer values from 1 to 10, which tells me, to which quantile the specific number in wins
for a specific Player
belongs, where 1 is between 0 and 0.1 quantile, 2 is between 0.1 and 0.2 quantile, ..., and 10 is between 0.9 and 1.0 quantile.
In other words
df.groupby("Player")["wins"].apply(magic_quantile_function)
The result should be something like
>>> df
Player wins quantile_bin
0 A 2 1
1 A 4 2
2 A 4 2
3 A 5 5
4 A 6 6
5 A 9 8
6 A 13 10
7 A 13 10
8 B 15 10
9 B 15 10
10 B 14 9
11 B 13 6
12 B 11 5
13 B 9 2
14 B 9 2
15 B 8 1
16 C 8 1
17 C 16 3
18 C 19 7
19 C 21 10
20 C 14 2
21 C 20 9
22 C 19 7
23 C 18 5
I would be grateful if one could help me defining this magic_quantile_function
, or at least giving me a hint on how to pass a grouped wins
distribution to a user defined function which is applied to a groupby object.
Thank you in advance!
Seems like you could use qcut
df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s,
q=10,
duplicates='drop',
labels=range(2, 11)))
Player wins quantile
0 A 2 2
1 A 4 3
2 A 4 3
3 A 5 6
4 A 6 7
5 A 9 9
6 A 13 10
7 A 13 10
8 B 15 10
9 B 15 10
10 B 14 9
11 B 13 7
12 B 11 6
13 B 9 3
14 B 9 3
15 B 8 2
16 C 8 2
17 C 16 4
18 C 19 7
19 C 21 10
20 C 14 3
21 C 20 9
22 C 19 7
23 C 18 6