I have a Dataframe like this
oper_status
2012-01-01 00:26:54.250 0
2012-01-01 12:11:54.250 1
2012-01-01 13:57:54.250 2
2012-01-02 00:16:54.250 0
2012-01-02 14:26:54.250 1
2012-01-02 17:20:54.250 0
2012-01-04 08:21:54.250 0
2012-01-04 15:34:54.250 1
2012-01-04 19:45:54.250 0
2012-01-05 01:00:54.250 0
2012-01-05 12:46:54.250 1
2012-01-05 20:27:54.250 2
(...) (...)
and I want to count how many times every month I have consecutive values with this pattern: 0,1,2. I tried looping on the rows with iterrows() but it's very slow since I have a big dataset. I also thought about using "diff" but I can't figure out a simple way to do it. Thanks
EDIT: The expected output is like this
count
time
2012-03-31 244
2012-04-30 65
2012-05-31 167
2012-06-30 33
2012-07-31 187
... ...
2013-05-31 113
2013-06-30 168
2013-07-31 294
2013-08-31 178
2013-09-30 65
Counting sequential patterns is a two step process. First, build a sequence for each row, representing the pattern ending at that row:
df['seq'] = df.order_status.astype(str).shift(periods=0) + '-' +
df.order_status.astype(str).shift(periods=1) + '-' +
df.order_status.astype(str).shift(periods=2)
date order_status seq
0 2012-01-01 00:26:54.250 0 NaN
1 2012-01-01 12:11:54.250 1 NaN
2 2012-01-01 13:57:54.250 2 2-1-0
3 2012-01-02 00:16:54.250 0 0-2-1
4 2012-01-02 14:26:54.250 1 1-0-2
5 2012-01-02 17:20:54.250 0 0-1-0
6 2012-01-04 08:21:54.250 0 0-0-1
7 2012-01-04 15:34:54.250 1 1-0-0
8 2012-01-04 19:45:54.250 0 0-1-0
9 2012-01-05 01:00:54.250 0 0-0-1
10 2012-01-05 12:46:54.250 1 1-0-0
11 2012-01-05 20:27:54.250 2 2-1-0
Then, filter down to only the correct sequences and aggregate up to your desired level:
df['month'] = df.date.dt.month
df[df.seq == '2-1-0'].groupby("month").month.count()
month
1 2
Alter as necessary to handle cases where you want patterns starting in a certain period, stopping there, entirely within, etc...