I have a data series in the dataframe format with the index as a datetime index. Currently the dataframe is only has a single column that I would like to reformat to have each month in the columns (i.e. 1,2,3,4,..) and keep the time index. I would like to then group by hour so that each hour is averaged of the month. I have provided my code below. essentially I'd like to create a dataframe that would have the average of the hourly data for each month. how can i pivot the data to have the hour for each month in the index and the month for the column name?
import pandas as pd
import numpy as np
dti = pd.date_range("2018-01-01", periods=8760, freq="H")
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(dti))
data1 = np.random.randint(1, high=100, size=len(dti))
data2 = np.random.randint(1, high=100, size=len(dti))
data3 = np.random.randint(1, high=100, size=len(dti))
#print(data)
df = pd.DataFrame({'date': dti, 'data': data, 'data1': data1, 'data2': data2, 'data3': data3 })
df = df.set_index('date')
#print(df)
df1 = df[['data']]
pd.pivot(df1,index=df1.index.hour,columns=df1.index.month)
df1 = df1.groupby(df1.index.hour).mean()
print(df1)
I am currently able to obtain the results for a full year if I comment out the line "pd.pivot(df1,index=df1.index.hour,columns=df1.index.month)" but I'd like to have 11 other columns for each month. shown with the code below.
import pandas as pd
import numpy as np
dti = pd.date_range("2018-01-01", periods=8760, freq="H")
np.random.seed(seed=1111)
data1 = np.random.randint(1, high=100, size=len(dti))
data2 = np.random.randint(1, high=100, size=len(dti))
data3 = np.random.randint(1, high=100, size=len(dti))
data4 = np.random.randint(1, high=100, size=len(dti))
#print(data)
df = pd.DataFrame({'date': dti, 'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4 })
df = df.set_index('date')
#print(df)
df1 = df[['data1']]
#pd.pivot(df1,index=df1.index.hour,columns=df1.index.month, values=df1[["data1"]])
df1 = df1.groupby(df1.index.hour).mean()
#print(df1)
Output:
data1
date
0 47.873973
1 49.454795
2 50.810959
3 48.726027
4 49.654795
5 49.745205
6 50.342466
7 47.753425
8 46.515068
9 49.364384
10 53.583562
11 49.983562
12 50.178082
13 51.049315
14 49.813699
15 51.032877
16 49.501370
17 49.641096
18 49.463014
19 48.504110
20 49.621918
21 51.060274
22 48.736986
23 50.139726
You can extract hour
and month
from timestamps and store them as separate columns and then you can group the data by hour
and month
and then calculate average for each month. Finally, you can unstack
the grouped data to have hours as rows and months as columns. Here's how you can achieve.
import pandas as pd
import numpy as np
# 1. Data Initialization
# Create hourly timestamps for a year
dti = pd.date_range("2018-01-01", periods=8760, freq="H")
# Generate random data for the column 'data1'
np.random.seed(seed=1111)
data1 = np.random.randint(1, high=100, size=len(dti))
# Create the DataFrame
df = pd.DataFrame({'data1': data1}, index=dti)
# 2. Hour & Month Extraction
# Extract hour and month from the timestamp and store as separate columns
df['hour'] = df.index.hour
df['month'] = df.index.month
# 3. Grouping & Averaging
# Group by hour and month and compute the average for 'data1'
grouped_data = df.groupby(['hour', 'month'])['data1'].mean()
# 4. Unstacking/Pivoting
# Unstack the grouped data to have hours as rows and months as columns
unstack_data= grouped_data.unstack()
print(unstack_data)
which results in
month 1 2 3 4 5 6 \
hour
0 52.516129 43.214286 47.000000 44.433333 41.129032 43.866667
1 47.838710 57.678571 49.322581 52.200000 58.419355 52.000000
2 53.193548 54.714286 53.064516 46.666667 47.354839 45.466667
3 49.677419 50.821429 45.903226 48.700000 49.096774 47.400000
4 49.935484 40.678571 45.580645 46.966667 49.451613 55.833333
5 46.129032 43.607143 49.387097 50.266667 46.645161 49.966667
6 49.419355 50.250000 40.838710 54.366667 52.741935 56.466667
7 42.322581 43.642857 54.193548 39.233333 46.419355 38.766667
8 43.000000 49.357143 52.580645 38.866667 46.225806 49.666667
9 48.096774 48.428571 45.419355 44.433333 56.225806 48.200000
10 61.096774 52.714286 51.645161 59.500000 50.322581 58.300000
11 46.258065 46.178571 58.032258 47.900000 53.322581 50.866667
12 48.612903 47.071429 51.741935 55.333333 42.193548 57.966667
13 55.645161 52.892857 42.419355 51.966667 50.129032 55.800000
14 47.225806 55.464286 53.225806 58.233333 48.677419 47.033333
15 50.548387 57.928571 48.129032 43.333333 47.064516 59.466667
16 42.032258 53.928571 47.516129 47.966667 46.774194 48.333333
17 51.064516 46.535714 54.000000 51.233333 59.064516 44.100000
18 43.483871 51.607143 46.903226 56.733333 37.258065 50.366667
19 40.709677 34.392857 42.161290 52.733333 50.580645 50.466667
20 54.000000 51.535714 55.709677 46.600000 48.516129 41.733333
21 51.258065 58.000000 47.612903 40.766667 50.322581 50.866667
22 55.000000 47.642857 52.612903 49.566667 46.548387 41.400000
23 47.838710 49.285714 56.935484 47.800000 46.322581 53.800000
month 7 8 9 10 11 12
hour
0 54.903226 47.967742 49.533333 53.322581 50.700000 45.354839
1 38.483871 48.064516 45.200000 49.032258 47.500000 48.483871
2 48.258065 46.290323 54.133333 52.838710 47.100000 60.709677
3 54.225806 51.903226 45.666667 47.677419 45.800000 47.806452
4 51.903226 47.225806 50.666667 46.161290 48.600000 62.096774
5 50.193548 48.483871 52.566667 50.516129 56.766667 52.161290
6 48.354839 53.935484 54.300000 47.870968 44.300000 51.516129
7 58.548387 48.451613 56.966667 52.322581 46.133333 45.322581
8 42.225806 41.193548 47.966667 49.032258 46.400000 51.838710
9 42.322581 44.967742 56.200000 56.483871 41.333333 59.935484
10 49.419355 49.032258 57.600000 50.387097 58.900000 44.645161
11 51.000000 48.516129 46.100000 53.193548 51.333333 46.612903
12 58.580645 40.193548 46.933333 57.290323 43.466667 52.548387
13 53.677419 50.741935 49.833333 45.870968 54.966667 49.096774
14 48.709677 48.774194 51.133333 44.870968 48.866667 46.290323
15 47.774194 54.967742 51.633333 51.903226 46.566667 53.645161
16 59.709677 57.000000 45.533333 46.258065 52.533333 46.741935
17 39.419355 49.419355 49.233333 51.064516 53.000000 47.225806
18 50.935484 54.258065 54.600000 58.258065 44.233333 45.387097
19 54.774194 53.161290 50.066667 52.709677 55.366667 44.032258
20 47.709677 46.741935 46.366667 57.387097 46.700000 52.096774
21 52.258065 59.129032 50.366667 59.419355 43.066667 49.709677
22 50.935484 44.225806 56.966667 41.387097 44.000000 54.354839
23 55.129032 44.516129 52.400000 45.774194 48.233333 53.612903
If the output is not what you wanted, please provide the expected output data for more clarity.