Search code examples
pandasdummy-variable

python: creating dummy variables per date


I have the following dataframe:

id start_date end_date value 
1  2010       2014     A
2  2010       2013     C
3  2012       2013     D

and I want to create new variables with the year and the value in it:

id 2010 2011 2012 2013 2014
1  A    A    A    A    A
2  C    C    C    C    Na
3  Na   Na   D    D    Na  

Solution

  • Create range by list comprehension, then use DataFrame.explode and DataFrame.pivot:

    df['y'] = [range(s, e+1) for s, e in zip(df['start_date'], df['end_date'])]
    df = df.explode('y').pivot('id','y','value')
    print (df)
    y  2010 2011 2012 2013 2014
    id                         
    1     A    A    A    A    A
    2     C    C    C    C  NaN
    3   NaN  NaN    D    D  NaN
    

    If performance is important use Index.repeat with GroupBy.cumcount:

    df = df.loc[df.index.repeat(df['end_date'].sub(df['start_date']).add(1))]
    df['y'] = df['start_date'].add(df.groupby(level=0).cumcount())
    
    df = df.pivot('id','y','value')
    print (df)
    y  2010 2011 2012 2013 2014
    id                         
    1     A    A    A    A    A
    2     C    C    C    C  NaN
    3   NaN  NaN    D    D  NaN