Search code examples
pythonpandasmoving-averagerolling-computation

Compare a pandas moving window to a list to find the window with least error


I have reduced my data set to the last few steps. My pandas dataframe looks like this

    FAC
0   1
1   2
2   1
3   3
4   2
5   1
6   2
7   1
8   1
9   3
10  2
11  1
12  2
13  3
14  1

I also have a list that I have identified to match.

match_list = [1, 2, 1, 1, 3]

what I am looking for is the slide through (5 item window) the data frame column and spot the row that matched the list pattern. The final result is something that looks like this. I will be thankful for any help.

    FAC Error
0   1   some val
1   2   some val
2   1   some val
3   3   some val
4   2   some val
5   1   some val
6   2   some val
7   1   0
8   1   some val
9   3   some val
10  2   some val
11  1   some val
12  2   some val
13  3   some val
14  1   some val

Solution

  • This can be done with rolling:

    match_list = [1, 2, 1, 1, 3]
    match_list = np.array(match_list)
    
    def match(x):
        return (len(x)==len(match_list) and (x==match_list).all())
    
    
    df['error'] = np.where(df.FAC.rolling(5, center=True).apply(match)==1, 0, 'some value')
    

    Output:

        FAC       error
    0     1  some value
    1     2  some value
    2     1  some value
    3     3  some value
    4     2  some value
    5     1  some value
    6     2  some value
    7     1           0
    8     1  some value
    9     3  some value
    10    2  some value
    11    1  some value
    12    2  some value
    13    3  some value
    14    1  some value
    

    Update: to count the match, you can simply do mean instead of all inside the function:

    def count_match(x):
        return (len(x)==len(match_list))* (x==match_list).mean()
    
    df['error'] = df.FAC.rolling(5,center=True).apply(count_match)
    

    Output:

        FAC  error
    0     1    NaN
    1     2    NaN
    2     1    0.6
    3     3    0.0
    4     2    0.4
    5     1    0.4
    6     2    0.2
    7     1    1.0
    8     1    0.2
    9     3    0.2
    10    2    0.4
    11    1    0.6
    12    2    0.0
    13    3    NaN
    14    1    NaN