I have a DataFrame like
id chi prop ord
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 0
5 100 L 71 0
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 0
12 101 L 67 0
13 101 L 68 0
14 101 L 69 0
15 101 L 71 0
16 101 L 72 0
17 201 R 67 0
18 201 R 68 0
19 201 R 69 0
ord
essentially gives the ordering of the entries when (prop
, chi
and id
) all have the same value. This isn't quite what I'd like though. Instead, I'd like to be able to enumerate the entries of each group g in {(id, chi)}
from 0 to n_g where n_g is the size of group g. So I'd like to obtain something that looks like
id chi prop count
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 4
5 100 L 71 5
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 1
12 101 L 67 0
13 101 L 68 1
14 101 L 69 2
15 101 L 71 3
16 101 L 72 4
17 201 R 67 0
18 201 R 68 1
19 201 R 69 2
I'd like to know if there's a simple way of doing this with pandas
. The following comes very close, but it feels way too complicated, and it for some reason won't let me join
the resulting dataframe with the original one.
(df.groupby(['id', 'chi'])
.apply(lambda g: np.arange(g.shape[0]))
.apply(pd.Series, 1)
.stack()
.rename('counter')
.reset_index()
.drop(columns=['level_2']))
EDIT: A second way of course is the for
loop way, but I'm looking for something more "Pythonic" than:
for gname, idx in df.groupby(['id','chi']).groups.items():
tmp = df.loc[idx]
df.loc[idx, 'counter'] = np.arange(tmp.shape[0])
R has a very simple way of achieving this behaviour using the tidyverse
packages, but I haven't quite found the well-oiled way to achieve the same thing with pandas
. Any help provided is greatly appreciated!
.groupby().cumcount()
df.assign(ord=df.groupby(['id', 'chi']).cumcount())
id chi prop ord
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 4
5 100 L 71 5
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 1
12 101 L 67 0
13 101 L 68 1
14 101 L 69 2
15 101 L 71 3
16 101 L 72 4
17 201 R 67 0
18 201 R 68 1
19 201 R 69 2
defaultdict
and itertools.count
from itertools import count
from collections import defaultdict
d = defaultdict(count)
df.assign(ord=[next(d[t]) for t in zip(df.id, df.chi)])
id chi prop ord
0 100 L 67 0
1 100 L 68 1
2 100 L 68 2
3 100 L 68 3
4 100 L 70 4
5 100 L 71 5
6 100 R 67 0
7 100 R 68 1
8 100 R 68 2
9 100 R 68 3
10 110 R 70 0
11 110 R 71 1
12 101 L 67 0
13 101 L 68 1
14 101 L 69 2
15 101 L 71 3
16 101 L 72 4
17 201 R 67 0
18 201 R 68 1
19 201 R 69 2