Search code examples
python-3.xdataframetime-seriesanomaly-detection

How to return first item when the items in the pandas dataframe window are the same?


I am a python beginner. I have the following pandas DataFrame, with only two columns; "Time" and "Input".

I want to loop over the "Input" column. Assuming we have a window size w= 3. (three consecutive values) such that for every selected window, we will check if all the items/elements within that window are 1's, then return the first item as 1 and change the remaining values to 0's.

index Time  Input
0     11      0
1     22      0
2     33      0
3     44      1
4     55      1
5     66      1
6     77      0
7     88      0
8     99      0
9   1010      0
10  1111      1
11  1212      1
12  1313      1
13  1414      0
14  1515      0

My intended output is as follows

index   Time  Input  What_I_got  What_I_Want
    0     11      0           0            0
    1     22      0           0            0
    2     33      0           0            0
    3     44      1           1            1
    4     55      1           1            0
    5     66      1           1            0
    6     77      1           1            1
    7     88      1           0            0
    8     99      1           0            0
    9   1010      0           0            0
    10  1111      1           1            1
    11  1212      1           0            0
    12  1313      1           0            0
    13  1414      0           0            0
    14  1515      0           0            0

What should I do to get the desired output? Am I missing something in my code?


Solution

  • import pandas as pd import re

     pd.Series(list(re.sub('111', '100', ''.join(df.Input.astype(str))))).astype(int)
    Out[23]: 
    0     0
    1     0
    2     0
    3     1
    4     0
    5     0
    6     1
    7     0
    8     0
    9     0
    10    1
    11    0
    12    0
    13    0
    14    0
    dtype: int32