I want to filter out data points who's values are equal for at least n rows.
Specifically, detecting equal values for at least n rows and filter out all the consecutive equal data too.
Here an example with n = 6 and the flat line is 8 rows long:
import pandas as pd
data={'col1':[1, 3, 3, 3, 3, 3, 3, 3, 3, 4, 1, 1, 1, 1, 1]}
df=pd.DataFrame(data,columns=['col1'])
print df
col1
0 1
1 3
2 3
3 3
4 3
5 3
6 3
7 3
8 3
9 4
10 1
11 1
12 1
13 1
14 1
I want to get the following pd.Series "flatline_filter":
s flatline_filter
0 1 True
1 3 False
2 3 False
3 3 False
4 3 False
5 3 False
6 3 False
7 3 False
8 3 False
9 4 True
10 1 True
11 1 True
12 1 True
13 1 True
14 1 True
To the filter out those flat lines at some point:
df = df[flatline_filter]
Using a custom groupby.transform
:
n = 6
df['flatline_filter'] = (df.groupby(df['col1'].diff().ne(0).cumsum())
.transform('size').lt(n)
)
Output:
col1 flatline_filter
0 1 True
1 3 False
2 3 False
3 3 False
4 3 False
5 3 False
6 3 False
7 3 False
8 3 False
9 4 True
10 1 True
11 1 True
12 1 True
13 1 True
14 1 True