The dataframe below tracks daily changes in parameters "rate"
, "pressure"
, and "cut"
. The date information is split between a column 'Months&year'
(Jan 2024, Dec 2023, etc...) on one side, while days (1-31) make the index of the dataframe:
dataframe.head()
Month&Year parameter 1 2 3 .. 31
Jan 2024 rate 22 1 7 6
Jan 2024 pressure 67 8 8 8
Jan 2024 cut 2 67 7 2
Dec 2023 rate 8 9 1 0
Dec 2023 pressure 6 8 11 3
Dec 2023 cut 7 8 77 8
I seek to gather the date information ("day"
, "month&year"
) on one side so as to reshape the dataframe into a timeseries, this way:
Month&Year parameter rate pressure cut
Jan 2024 1 22 67 2
Jan 2024 2 1 8 67
Jan 2024 3 7 8 7
..
Jan 2024 31 6 8 2
Dec 2023 1 8 6 7
Dec 2023 2 9 8 8
Dec 2023 3 1 11 77
..
Dec 2023 31 0 3 8
I guess I will need to split this dataframe to multiple datafames using groupby for month&year, transpose each of new datasets, and then merge them again.
Here is what I have achieved so far, when I used groupby and splited this dataframe to separate dataframes:
in:
mo=dataframe.groupby('Month&year')
dataframe_months=[mo.get_group(x) for x in mo.groups]
print ("result: \"n", dataframe_months)
out:
Month&Year parameter 1 2 3 .. 31
Jan 2024 rate 22 1 7 6
Jan 2024 pressure 67 8 8 8
Jan 2024 cut 2 67 7 2
Month&Year parameter 1 2 3 .. 31
Dec 2023 rate 8 9 1 0
Dec 2023 pressure 6 8 11 3
Dec 2023 cut 7 8 77 8
Could you please suggest how can I refer to splited dataframes without writing manually their names (there are many years) to transpose them and merge in the same order as before.
Your proposed strategy looks unnecessarily complicated for what is a mere reshaping of your dataframe. The pandas user guide recommends a pivot_table:
Repeating input data for clarity
(skipping undisclosed days)
df = pd.DataFrame({'Month&Year': ['Jan 2024','Jan 2024','Jan 2024','Dec 2023','Dec 2023','Dec 2023'],
'parameter': ['rate', 'pressure', 'cut', 'rate', 'pressure', 'cut'],
1: [22, 67, 2, 8, 6, 7],
2: [1, 8, 67, 9, 8, 8],
3: [7, 8, 7, 1, 11, 77],
31: [6, 8, 2, 0, 3, 8]})
Copy-pasting the output of print(df)
:
Month&Year parameter 1 2 3 31
0 Jan 2024 rate 22 1 7 6
1 Jan 2024 pressure 67 8 8 8
2 Jan 2024 cut 2 67 7 2
3 Dec 2023 rate 8 9 1 0
4 Dec 2023 pressure 6 8 11 3
5 Dec 2023 cut 7 8 77 8
1. Pivot 'Month&Year'
and the days (1 to 31) to the same side
Here we swap 'Month&Year'
up as another level of columns using pivot_table
, then transpose (T
) and finally flatten the table (reset_index
).
dfp = pd.pivot_table(data = df,
values = [1,2,3,31],
columns = 'Month&Year',
index = 'parameter'
).T.reset_index()
parameter level_0 Month&Year cut pressure rate
0 1 Dec 2023 7 6 8
1 1 Jan 2024 2 67 22
2 2 Dec 2023 8 8 9
3 2 Jan 2024 67 8 1
4 3 Dec 2023 77 11 1
5 3 Jan 2024 7 8 7
6 31 Dec 2023 8 3 0
7 31 Jan 2024 2 8 6
2. Finishing touch for readability of index and columns
'parameter'
as name for the column holding the days. This felt confusing so I named it 'Day'
instead.'parameter'
ends up as the name of the columns index, where it becomes confusing as well. We can simply remove that label.dfp = dfp.rename(columns = {'level_0' : 'Day'})
dfp.columns = dfp.columns.rename(None)
dfp
Day Month&Year cut pressure rate
0 1 Dec 2023 7 6 8
1 1 Jan 2024 2 67 22
2 2 Dec 2023 8 8 9
3 2 Jan 2024 67 8 1
4 3 Dec 2023 77 11 1
5 3 Jan 2024 7 8 7
6 31 Dec 2023 8 3 0
7 31 Jan 2024 2 8 6
Sorting by date will be another question involving pandas.datetime.