I have a list of activities and the approximate timestamp they occur in. I would like to count the occurences of a string in the previous 'x' rows (walking or running etc.) and add it to the dataframe. Pandas DataFrame does not support rolling (for non-numeric data) and I'm not sure if I can use shift to check like the previous 30, 50 or even 70 rows of data. I haven't made any concrete progress yet as I have been looking for similar questions/solutions on the site.
timestamp event
0 2021-12-18 18:20:25+08:00 running
1 2021-12-18 18:20:27+08:00 running
2 2021-12-18 18:20:29+08:00 walking
3 2021-12-18 18:20:31+08:00 walking
4 2021-12-18 18:20:33+08:00 walking
5 2021-12-18 18:20:35+08:00 walking
6 2021-12-18 18:20:37+08:00 walking
7 2021-12-18 18:20:39+08:00 walking
8 2021-12-18 18:20:41+08:00 stationary
9 2021-12-18 18:20:43+08:00 stationary
10 2021-12-18 18:20:45+08:00 stationary
11 2021-12-18 18:20:47+08:00 stationary
df.loc[:, 'Count previous K'] = 0 # new column to count previous row activities
expected output:
timestamp event Count previous K
0 2021-12-18 18:20:25+08:00 running 0
1 2021-12-18 18:20:27+08:00 running 0
2 2021-12-18 18:20:29+08:00 walking 1
3 2021-12-18 18:20:31+08:00 walking 2
4 2021-12-18 18:20:33+08:00 walking 3
5 2021-12-18 18:20:35+08:00 walking 4
6 2021-12-18 18:20:37+08:00 walking 5
7 2021-12-18 18:20:39+08:00 walking 6
8 2021-12-18 18:20:41+08:00 stationary 6
9 2021-12-18 18:20:43+08:00 stationary 6
10 2021-12-18 18:20:45+08:00 stationary 6
11 2021-12-18 18:20:47+08:00 stationary 6
12 2021-12-18 18:20:49+08:00 stationary 5
for a window size of 10 (including current index/row) counting occurences of walking.
You can use a boolean to see when a particular event is occurring, then perform a rolling sum on the boolean series. As @mozway pointed out, the argument min_periods=1
will avoid NaN
appearing at the beginning of the resulting DataFrame:
df['walking_count'] = (df['event'] == 'walking').rolling(5, min_periods=1).sum()
This sets a new column 'walking_count'
to the following series:
0 0.0
1 0.0
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 5.0
8 4.0
9 3.0
10 2.0
11 1.0