Search code examples
pythonpandasdataframepandas-groupby

how to create various dummies based on consecutive values from another column


I have the following panel dataset. "winner" =1 if in period (date), someone is a winner, zero if loser.

ID   date  winner 
A   2017Q4  NaN
A   2018Q4   1         
A   2019Q4   0   
A   2020Q4   0    
A   2021Q4   1   
B   2017Q4  NaN 
B   2018Q4   1   
B   2019Q4   1   
B   2020Q4   0    
B   2021Q4   0   
C   2017Q4  NaN
C   2018Q4   0      
C   2019Q4   0    
C   2020Q4   0  
C   2021Q4   0 
D   2017Q4  NaN   
D   2018Q4   0                 
D   2019Q4   1   
D   2020Q4   1 
D   2021Q4   1 

I want to create four dummy variables, WW =1 if someone is winner in two consecutive periods. LL=1 if loser in two consecutive periods. WL if winner in period 1 and loser the next period, and LW vice versa.

UPDATE

when i apply the answers below i get the following

ID   date  winner  WW  LL  WL  LW
A   2017Q4  NaN
A   2018Q4   1     0   0   0   0  
A   2019Q4   0     0   0   1   0    
A   2020Q4   0     0   1   0   0  
A   2021Q4   1     0   0   0   1
B   2017Q4  NaN
B   2018Q4   1     0   0   0   0   
B   2019Q4   1     1   0   0   0
B   2020Q4   0     0   0   1   0
B   2021Q4   0     0   1   0   0
C   2017Q4  NaN
C   2018Q4   0     0   0   0   0      
C   2019Q4   0     0   1   0   0
C   2020Q4   0     0   1   0   0
C   2021Q4   0     0   1   0   0
D   2017Q4  NaN
D   2018Q4   0     0   0   0   0                
D   2019Q4   1     0   0   0   1
D   2020Q4   1     1   0   0   0
D   2021Q4   1     1   0   0   0

How do i make sure I get the NaN when the previous value is NaN? desired output

ID   date  winner  WW  LL  WL  LW
A   2017Q4  NaN
A   2018Q4   1    NaN NaN NaN NaN   
A   2019Q4   0     0   0   1   0    
A   2020Q4   0     0   1   0   0  
A   2021Q4   1     0   0   0   1
B   2017Q4  NaN
B   2018Q4   1    NaN NaN NaN NaN  
B   2019Q4   1     1   0   0   0
B   2020Q4   0     0   0   1   0
B   2021Q4   0     0   1   0   0
C   2017Q4  NaN
C   2018Q4   0    NaN NaN NaN NaN    
C   2019Q4   0     0   1   0   0
C   2020Q4   0     0   1   0   0
C   2021Q4   0     0   1   0   0
D   2017Q4  NaN
D   2018Q4   0    NaN NaN NaN NaN               
D   2019Q4   1     0   0   0   1
D   2020Q4   1     1   0   0   0
D   2021Q4   1     1   0   0   0

How to do this in the most simple way?


Solution

    1. map 1 and 0 to "W" and "L"
    2. get the 2-period streak
    3. get_dummies for the "streak"
    4. join to original DataFrame ignoring the first row of each ID
    wins = df["winner"].fillna(0).map({1:"W",0:"L"})
    streaks = wins.shift() + wins
    other = pd.get_dummies(streaks.where(df["ID"].eq(df["ID"].shift())))
    output = df.join(other.where(df["ID"].duplicated()&df["winner"].shift().notna()))
    
    >>> output
    
       ID    date  winner   LL   LW   WL   WW
    0   A  2017Q4     NaN  NaN  NaN  NaN  NaN
    1   A  2018Q4     1.0  NaN  NaN  NaN  NaN
    2   A  2019Q4     0.0  0.0  0.0  1.0  0.0
    3   A  2020Q4     0.0  1.0  0.0  0.0  0.0
    4   A  2021Q4     1.0  0.0  1.0  0.0  0.0
    5   B  2017Q4     NaN  NaN  NaN  NaN  NaN
    6   B  2018Q4     1.0  NaN  NaN  NaN  NaN
    7   B  2019Q4     1.0  0.0  0.0  0.0  1.0
    8   B  2020Q4     0.0  0.0  0.0  1.0  0.0
    9   B  2021Q4     0.0  1.0  0.0  0.0  0.0
    10  C  2017Q4     NaN  NaN  NaN  NaN  NaN
    11  C  2018Q4     0.0  NaN  NaN  NaN  NaN
    12  C  2019Q4     0.0  1.0  0.0  0.0  0.0
    13  C  2020Q4     0.0  1.0  0.0  0.0  0.0
    14  C  2021Q4     0.0  1.0  0.0  0.0  0.0
    15  D  2017Q4     NaN  NaN  NaN  NaN  NaN
    16  D  2018Q4     0.0  NaN  NaN  NaN  NaN
    17  D  2019Q4     1.0  0.0  1.0  0.0  0.0
    18  D  2020Q4     1.0  0.0  0.0  0.0  1.0
    19  D  2021Q4     1.0  0.0  0.0  0.0  1.0