Search code examples
pythonpandasdataframegroup-bypivot

Pivot and groupby to reformat data with pandas in pythong


I have a data series in the dataframe format with the index as a datetime index. Currently the dataframe is only has a single column that I would like to reformat to have each month in the columns (i.e. 1,2,3,4,..) and keep the time index. I would like to then group by hour so that each hour is averaged of the month. I have provided my code below. essentially I'd like to create a dataframe that would have the average of the hourly data for each month. how can i pivot the data to have the hour for each month in the index and the month for the column name?

import pandas as pd
import numpy as np

dti = pd.date_range("2018-01-01", periods=8760, freq="H")

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(dti))
data1 = np.random.randint(1, high=100, size=len(dti))
data2 = np.random.randint(1, high=100, size=len(dti))
data3 = np.random.randint(1, high=100, size=len(dti))
#print(data)

df = pd.DataFrame({'date': dti, 'data': data, 'data1': data1, 'data2': data2, 'data3': data3 })
df = df.set_index('date')
#print(df)

df1 = df[['data']]
pd.pivot(df1,index=df1.index.hour,columns=df1.index.month)
df1 = df1.groupby(df1.index.hour).mean()
print(df1)

I am currently able to obtain the results for a full year if I comment out the line "pd.pivot(df1,index=df1.index.hour,columns=df1.index.month)" but I'd like to have 11 other columns for each month. shown with the code below.

import pandas as pd
import numpy as np

dti = pd.date_range("2018-01-01", periods=8760, freq="H")

np.random.seed(seed=1111)
data1 = np.random.randint(1, high=100, size=len(dti))
data2 = np.random.randint(1, high=100, size=len(dti))
data3 = np.random.randint(1, high=100, size=len(dti))
data4 = np.random.randint(1, high=100, size=len(dti))
#print(data)

df = pd.DataFrame({'date': dti, 'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4 })
df = df.set_index('date')
#print(df)

df1 = df[['data1']]
#pd.pivot(df1,index=df1.index.hour,columns=df1.index.month, values=df1[["data1"]])
df1 = df1.groupby(df1.index.hour).mean()
#print(df1)

Output:

          data1
date           
0     47.873973
1     49.454795
2     50.810959
3     48.726027
4     49.654795
5     49.745205
6     50.342466
7     47.753425
8     46.515068
9     49.364384
10    53.583562
11    49.983562
12    50.178082
13    51.049315
14    49.813699
15    51.032877
16    49.501370
17    49.641096
18    49.463014
19    48.504110
20    49.621918
21    51.060274
22    48.736986
23    50.139726

Solution

  • You can extract hour and month from timestamps and store them as separate columns and then you can group the data by hour and month and then calculate average for each month. Finally, you can unstack the grouped data to have hours as rows and months as columns. Here's how you can achieve.

    import pandas as pd
    import numpy as np
    
    # 1. Data Initialization
    # Create hourly timestamps for a year
    dti = pd.date_range("2018-01-01", periods=8760, freq="H")
    
    # Generate random data for the column 'data1'
    np.random.seed(seed=1111)
    data1 = np.random.randint(1, high=100, size=len(dti))
    
    # Create the DataFrame
    df = pd.DataFrame({'data1': data1}, index=dti)
    
    # 2. Hour & Month Extraction
    # Extract hour and month from the timestamp and store as separate columns
    
    df['hour'] = df.index.hour
    df['month'] = df.index.month
    
    # 3. Grouping & Averaging
    # Group by hour and month and compute the average for 'data1'
    
    grouped_data = df.groupby(['hour', 'month'])['data1'].mean()
    
    # 4. Unstacking/Pivoting
    # Unstack the grouped data to have hours as rows and months as columns
    unstack_data= grouped_data.unstack()
    
    print(unstack_data)
    

    which results in

    
    month         1          2          3          4          5          6   \
    hour                                                                      
    0      52.516129  43.214286  47.000000  44.433333  41.129032  43.866667   
    1      47.838710  57.678571  49.322581  52.200000  58.419355  52.000000   
    2      53.193548  54.714286  53.064516  46.666667  47.354839  45.466667   
    3      49.677419  50.821429  45.903226  48.700000  49.096774  47.400000   
    4      49.935484  40.678571  45.580645  46.966667  49.451613  55.833333   
    5      46.129032  43.607143  49.387097  50.266667  46.645161  49.966667   
    6      49.419355  50.250000  40.838710  54.366667  52.741935  56.466667   
    7      42.322581  43.642857  54.193548  39.233333  46.419355  38.766667   
    8      43.000000  49.357143  52.580645  38.866667  46.225806  49.666667   
    9      48.096774  48.428571  45.419355  44.433333  56.225806  48.200000   
    10     61.096774  52.714286  51.645161  59.500000  50.322581  58.300000   
    11     46.258065  46.178571  58.032258  47.900000  53.322581  50.866667   
    12     48.612903  47.071429  51.741935  55.333333  42.193548  57.966667   
    13     55.645161  52.892857  42.419355  51.966667  50.129032  55.800000   
    14     47.225806  55.464286  53.225806  58.233333  48.677419  47.033333   
    15     50.548387  57.928571  48.129032  43.333333  47.064516  59.466667   
    16     42.032258  53.928571  47.516129  47.966667  46.774194  48.333333   
    17     51.064516  46.535714  54.000000  51.233333  59.064516  44.100000   
    18     43.483871  51.607143  46.903226  56.733333  37.258065  50.366667   
    19     40.709677  34.392857  42.161290  52.733333  50.580645  50.466667   
    20     54.000000  51.535714  55.709677  46.600000  48.516129  41.733333   
    21     51.258065  58.000000  47.612903  40.766667  50.322581  50.866667   
    22     55.000000  47.642857  52.612903  49.566667  46.548387  41.400000   
    23     47.838710  49.285714  56.935484  47.800000  46.322581  53.800000   
    
    month         7          8          9          10         11         12  
    hour                                                                     
    0      54.903226  47.967742  49.533333  53.322581  50.700000  45.354839  
    1      38.483871  48.064516  45.200000  49.032258  47.500000  48.483871  
    2      48.258065  46.290323  54.133333  52.838710  47.100000  60.709677  
    3      54.225806  51.903226  45.666667  47.677419  45.800000  47.806452  
    4      51.903226  47.225806  50.666667  46.161290  48.600000  62.096774  
    5      50.193548  48.483871  52.566667  50.516129  56.766667  52.161290  
    6      48.354839  53.935484  54.300000  47.870968  44.300000  51.516129  
    7      58.548387  48.451613  56.966667  52.322581  46.133333  45.322581  
    8      42.225806  41.193548  47.966667  49.032258  46.400000  51.838710  
    9      42.322581  44.967742  56.200000  56.483871  41.333333  59.935484  
    10     49.419355  49.032258  57.600000  50.387097  58.900000  44.645161  
    11     51.000000  48.516129  46.100000  53.193548  51.333333  46.612903  
    12     58.580645  40.193548  46.933333  57.290323  43.466667  52.548387  
    13     53.677419  50.741935  49.833333  45.870968  54.966667  49.096774  
    14     48.709677  48.774194  51.133333  44.870968  48.866667  46.290323  
    15     47.774194  54.967742  51.633333  51.903226  46.566667  53.645161  
    16     59.709677  57.000000  45.533333  46.258065  52.533333  46.741935  
    17     39.419355  49.419355  49.233333  51.064516  53.000000  47.225806  
    18     50.935484  54.258065  54.600000  58.258065  44.233333  45.387097  
    19     54.774194  53.161290  50.066667  52.709677  55.366667  44.032258  
    20     47.709677  46.741935  46.366667  57.387097  46.700000  52.096774  
    21     52.258065  59.129032  50.366667  59.419355  43.066667  49.709677  
    22     50.935484  44.225806  56.966667  41.387097  44.000000  54.354839  
    23     55.129032  44.516129  52.400000  45.774194  48.233333  53.612903  
    
    

    If the output is not what you wanted, please provide the expected output data for more clarity.