Search code examples
pythonpandasdataframedataset

How to filter out "flat lines" (values, that do not change for n rows)


I want to filter out data points who's values are equal for at least n rows.

Specifically, detecting equal values for at least n rows and filter out all the consecutive equal data too.

Here an example with n = 6 and the flat line is 8 rows long:

import pandas as pd
data={'col1':[1, 3, 3, 3, 3, 3, 3, 3, 3, 4, 1, 1, 1, 1, 1]}
df=pd.DataFrame(data,columns=['col1'])
print df


          col1
    0     1          
    1     3          
    2     3          
    3     3          
    4     3          
    5     3          
    6     3          
    7     3
    8     3
    9     4
    10    1
    11    1
    12    1
    13    1
    14    1

I want to get the following pd.Series "flatline_filter":

    s          flatline_filter  
    0     1    True     
    1     3    False     
    2     3    False    
    3     3    False     
    4     3    False     
    5     3    False     
    6     3    False     
    7     3    False
    8     3    False
    9     4    True
    10    1    True
    11    1    True
    12    1    True
    13    1    True
    14    1    True 

To the filter out those flat lines at some point:

df = df[flatline_filter]

Solution

  • Using a custom groupby.transform:

    n = 6
    
    df['flatline_filter'] = (df.groupby(df['col1'].diff().ne(0).cumsum())
                               .transform('size').lt(n)
                            )
    

    Output:

        col1  flatline_filter
    0      1             True
    1      3            False
    2      3            False
    3      3            False
    4      3            False
    5      3            False
    6      3            False
    7      3            False
    8      3            False
    9      4             True
    10     1             True
    11     1             True
    12     1             True
    13     1             True
    14     1             True