Search code examples
pythonpandaspaneldata-wranglingdummy-variable

Add a new column having a dummy variable for complete group based on a condition


I have the following dataframe:

df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'], 
            "date": [2015, 2016,  2017, 2018,
                     2015, 2016,  2017, 2018, 
                     2015, 2016,  2017, 2018],
            "col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})

I want to add a new col that should be a dummy 1 for a group if "col_1" has three consecutive 1s. The resulting data-frame should be:

df_new = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'  ], 
            "date": [2015, 2016,  2017, 2018,
                     2015, 2016,  2017, 2018, 
                     2015, 2016,  2017, 2018],                
            "col_1": [1,1,1,0,1,0,1,1,0,1,1,1],
            "col_2": [1,1,1,1,0,0,0,0,1,1,1,1]})

Solution

  • You can do it like this also.

    df['col_2'] = (df.groupby('id')['col_1']
                     .transform(lambda x: x.rolling(3).sum().eq(3).any())
                     .astype(int))
    df
    

    Output:

       id  date  col_1  col_2
    0   A  2015      1      1
    1   A  2016      1      1
    2   A  2017      1      1
    3   A  2018      0      1
    4   B  2015      1      0
    5   B  2016      0      0
    6   B  2017      1      0
    7   B  2018      1      0
    8   C  2015      0      1
    9   C  2016      1      1
    10  C  2017      1      1
    11  C  2018      1      1