I have data dataframe where in which I want to add column which will indicate the order of the of the date for each user_id separately, so it will be like in TABLE2.
TABLE1:
user_id received_at action
0043e1a6-52e4 2018-01-05 12:32:10 A
2018-01-05 12:33:13 A
2018-01-05 12:42:12 B
0070f782-29f4 2018-01-06 01:41:18 A
2018-01-06 01:42:12 A
2018-01-06 01:43:11 B
2018-01-06 01:44:18 C
008aa58a-84a5 2018-01-06 14:22:13 A
2018-01-06 14:23:18 A
2018-01-06 14:24:13 A
2018-01-06 14:25:18 C
to something like this
TABLE2:
user_id received_at action order_n
0043e1a6-52e4 2018-01-05 12:32:10 A 1
2018-01-05 12:33:13 A 2
2018-01-05 12:42:12 B 3
0070f782-29f4 2018-01-06 01:41:18 A 1
2018-01-06 01:42:12 A 2
2018-01-06 01:43:11 B 3
2018-01-06 01:44:18 C 4
008aa58a-84a5 2018-01-06 14:22:13 A 1
2018-01-06 14:23:18 A 2
2018-01-06 14:24:13 A 3
2018-01-06 14:25:18 C 4
Are there are more efficient and maybe simpler solution than doing iteration for each user_id separately?
groupby user_id
and get the rank
using received_at
df['count_n'] = df.groupby('user_id').received_at.apply(pd.Series.rank)
This doesn't require a sorting step & will assign the correct rank even if the data frame is not sorted by received_at
within each group
if the column user_id
is set as an index (as your sample data seems to indicate), you could alternative use the following instead. Although, in recent versions of pandas, grouping by named indexes also works (i.e. the above might work)
df.groupby(level=0).received_at.apply(pd.Series.rank)