Search code examples
pythonpandasdatetimetime-seriesdummy-variable

How to add empty/dummy row with continuous datetime index in pandas?


This is my dataframe

                                 consumption  hour
start_time
2022-09-30 14:00:00+02:00            199.0  14.0
2022-09-30 15:00:00+02:00            173.0  15.0
2022-09-30 16:00:00+02:00            173.0  16.0
2022-09-30 17:00:00+02:00            156.0  17.0
2022-09-30 18:00:00+02:00            142.0  18.0
2022-09-30 19:00:00+02:00            163.0  19.0
2022-09-30 20:00:00+02:00            138.0  20.0
2022-09-30 21:00:00+02:00            183.0  21.0
2022-09-30 22:00:00+02:00            138.0  22.0
2022-09-30 23:00:00+02:00            143.0  23.0

I want outout like this

                                 consumption  hour
start_time
2022-09-30 14:00:00+02:00            199.0  14.0
2022-09-30 15:00:00+02:00            173.0  15.0
2022-09-30 16:00:00+02:00            173.0  16.0
2022-09-30 17:00:00+02:00            156.0  17.0
2022-09-30 18:00:00+02:00            142.0  18.0
2022-09-30 19:00:00+02:00            163.0  19.0
2022-09-30 20:00:00+02:00            138.0  20.0
2022-09-30 21:00:00+02:00            183.0  21.0
2022-09-30 22:00:00+02:00            138.0  22.0
2022-09-30 23:00:00+02:00            143.0  23.0
*2022-09-31 00:00:00+02:00           00.0   00.0*
*2022-09-31 01:00:00+02:00           00.0   01.0*

Here my index is datetime (start_time), i want to create rows with continuation of datetime and values as dummy or zero. How to do it in pandas python?


Solution

  • Create helper DataFrame and add to original by concat:

    N = 2
    df1 = (pd.DataFrame({'consumption':0}, 
                         index=pd.date_range(df.index.max() + pd.Timedelta('1h'),
                               df.index.max() + pd.Timedelta(f'{N}h'),
                               freq='H'))
              .assign(hour=lambda x: x.index.hour))
    
    df = pd.concat([df, df1])
    print (df)
                               consumption  hour
    2022-09-30 14:00:00+02:00        199.0  14.0
    2022-09-30 15:00:00+02:00        173.0  15.0
    2022-09-30 16:00:00+02:00        173.0  16.0
    2022-09-30 17:00:00+02:00        156.0  17.0
    2022-09-30 18:00:00+02:00        142.0  18.0
    2022-09-30 19:00:00+02:00        163.0  19.0
    2022-09-30 20:00:00+02:00        138.0  20.0
    2022-09-30 21:00:00+02:00        183.0  21.0
    2022-09-30 22:00:00+02:00        138.0  22.0
    2022-09-30 23:00:00+02:00        143.0  23.0
    2022-10-01 00:00:00+02:00          0.0   0.0
    2022-10-01 01:00:00+02:00          0.0   1.0
    

    Or use DataFrame.reindex with new index with added N hours:

    N = 2
    df = (df.reindex(pd.date_range(df.index.min(), 
                                   df.index.max() + pd.Timedelta(f'{N}h'), 
                                   freq='H'), fill_value=0)
            .assign(hour=lambda x: x.index.hour))
    
    print (df)
                               consumption  hour
    2022-09-30 14:00:00+02:00        199.0    14
    2022-09-30 15:00:00+02:00        173.0    15
    2022-09-30 16:00:00+02:00        173.0    16
    2022-09-30 17:00:00+02:00        156.0    17
    2022-09-30 18:00:00+02:00        142.0    18
    2022-09-30 19:00:00+02:00        163.0    19
    2022-09-30 20:00:00+02:00        138.0    20
    2022-09-30 21:00:00+02:00        183.0    21
    2022-09-30 22:00:00+02:00        138.0    22
    2022-09-30 23:00:00+02:00        143.0    23
    2022-10-01 00:00:00+02:00          0.0     0
    2022-10-01 01:00:00+02:00          0.0     1