Search code examples
pythonpython-3.xdataframerow

How to add new rows to Pandas data frame based on calculation using all rows for a given datetime?


I have the following data frame that consists of device electricity usage in my home broken up by hour.

             DateTime                   Name    kWh
0     1/31/2024 18:00              Sump Pump  0.003
1     1/31/2024 18:00              Furnace 2  0.027
2     1/31/2024 18:00              Furnace 1  0.211
3     1/31/2024 18:00                  Dryer  0.302
4     1/31/2024 18:00            Total Usage  1.927
5     1/31/2024 18:00              Always On  0.811
6     1/31/2024 18:00                 Heat 7  0.003
7     1/31/2024 18:00                 Fridge  0.030
8     1/31/2024 18:00                   Pump  0.069
9     1/31/2024 19:00              Sump Pump  0.002
10    1/31/2024 19:00            Total Usage  1.593
11    1/31/2024 19:00              Always On  0.828
12    1/31/2024 19:00                 Heat 7  0.001
13    1/31/2024 19:00                 Fridge  0.02
14    1/31/2024 19:00                   Pump  0.070

I would like to add a row for every hour where Name = "Other" and kWh equals ("Total Usage" kWh - the sum of kWh for all other rows for that hour) so the resulting data frame would be the following. The Names, order, and number of rows for each hour will vary. (Note I added 2 rows where Name = "Other".

             DateTime                   Name    kWh
0     1/31/2024 18:00              Sump Pump  0.003
1     1/31/2024 18:00              Furnace 2  0.027
2     1/31/2024 18:00              Furnace 1  0.211
3     1/31/2024 18:00                  Dryer  0.302
4     1/31/2024 18:00            Total Usage  1.927
5     1/31/2024 18:00              Always On  0.811
6     1/31/2024 18:00                 Heat 7  0.003
7     1/31/2024 18:00                 Fridge  0.030
8     1/31/2024 18:00                   Pump  0.069
9     1/31/2024 18:00                  Other  0.471
10    1/31/2024 19:00              Sump Pump  0.002
11    1/31/2024 19:00            Total Usage  1.593
12    1/31/2024 19:00              Always On  0.828
13    1/31/2024 19:00                 Heat 7  0.001
14    1/31/2024 19:00                 Fridge  0.02
15    1/31/2024 19:00                   Pump  0.070
16    1/31/2024 19:00                  Other  0.672

Thanks in advance for your help!


Solution

  • You could group on time and sum the kWh for each group; the basic approach is shown in the simplified code below:

    import pandas as pd
    
    df = pd.DataFrame({'time': [18, 18, 19, 19],
                       'Name' : ['a', 'z', 'a', 'z'],
                       'kWh': [1.2, 3.2, 4.2, 5.1]
                    })
    
    df2 = df.groupby('time', as_index = False)['kWh'].sum()
    
    df3 = pd.concat([df, df2]).fillna('Other').sort_values('time').reset_index(drop = True)
    
    print(df3)
    

    which gives:

       time   Name  kWh
    0    18      a  1.2
    1    18      z  3.2
    2    18  Other  4.4
    3    19      a  4.2
    4    19      z  5.1
    5    19  Other  9.3