python pandas group-by time-series pivot-table

Reshape dataframe into a timeseries when time information is split into columns and index?

The dataframe below tracks daily changes in parameters "rate", "pressure", and "cut". The date information is split between a column 'Months&year' (Jan 2024, Dec 2023, etc...) on one side, while days (1-31) make the index of the dataframe:

dataframe.head()

Month&Year  parameter   1   2   3   ..  31
Jan 2024    rate       22   1   7       6
Jan 2024    pressure    67  8   8       8
Jan 2024    cut         2   67  7       2
Dec 2023    rate        8   9   1       0
Dec 2023    pressure    6   8   11      3
Dec 2023    cut         7   8   77      8

I seek to gather the date information ("day", "month&year") on one side so as to reshape the dataframe into a timeseries, this way:

Month&Year  parameter   rate    pressure    cut
Jan 2024    1   22  67  2
Jan 2024    2   1   8   67
Jan 2024    3   7   8   7
..          
Jan 2024    31  6   8   2
    
Dec 2023    1   8   6   7
Dec 2023    2   9   8   8
Dec 2023    3   1   11  77
..          
Dec 2023    31  0   3   8

I guess I will need to split this dataframe to multiple datafames using groupby for month&year, transpose each of new datasets, and then merge them again.

Here is what I have achieved so far, when I used groupby and splited this dataframe to separate dataframes:

in:
    mo=dataframe.groupby('Month&year')
    dataframe_months=[mo.get_group(x) for x in mo.groups]
    print ("result: \"n", dataframe_months)


out:

    Month&Year  parameter   1   2   3   ..  31
    Jan 2024    rate    22  1   7       6
    Jan 2024    pressure    67  8   8       8
    Jan 2024    cut 2   67  7       2
                                                
                        
    Month&Year  parameter   1   2   3   ..  31
    Dec 2023    rate    8   9   1       0
    Dec 2023    pressure    6   8   11      3
    Dec 2023    cut 7   8   77      8

Could you please suggest how can I refer to splited dataframes without writing manually their names (there are many years) to transpose them and merge in the same order as before.

Solution

Your proposed strategy looks unnecessarily complicated for what is a mere reshaping of your dataframe. The pandas user guide recommends a pivot_table:

Repeating input data for clarity
(skipping undisclosed days)

df = pd.DataFrame({'Month&Year': ['Jan 2024','Jan 2024','Jan 2024','Dec 2023','Dec 2023','Dec 2023'],
                   'parameter': ['rate', 'pressure', 'cut', 'rate', 'pressure', 'cut'],
                   1: [22, 67, 2, 8, 6, 7],
                   2: [1, 8, 67, 9, 8, 8],
                   3: [7, 8, 7, 1, 11, 77],
                   31: [6, 8, 2, 0, 3, 8]})

Copy-pasting the output of print(df):

  Month&Year parameter   1   2   3  31
0   Jan 2024      rate  22   1   7   6
1   Jan 2024  pressure  67   8   8   8
2   Jan 2024       cut   2  67   7   2
3   Dec 2023      rate   8   9   1   0
4   Dec 2023  pressure   6   8  11   3
5   Dec 2023       cut   7   8  77   8

1. Pivot 'Month&Year' and the days (1 to 31) to the same side
Here we swap 'Month&Year' up as another level of columns using pivot_table, then transpose (T) and finally flatten the table (reset_index).

dfp = pd.pivot_table(data    = df,
                     values  = [1,2,3,31],
                     columns = 'Month&Year',
                     index   = 'parameter'
                    ).T.reset_index()

parameter  level_0 Month&Year  cut  pressure  rate
0                1   Dec 2023    7         6     8
1                1   Jan 2024    2        67    22
2                2   Dec 2023    8         8     9
3                2   Jan 2024   67         8     1
4                3   Dec 2023   77        11     1
5                3   Jan 2024    7         8     7
6               31   Dec 2023    8         3     0
7               31   Jan 2024    2         8     6

2. Finishing touch for readability of index and columns

Your desired output dataframe showed 'parameter' as name for the column holding the days. This felt confusing so I named it 'Day' instead.
As a result of the pivot, the label 'parameter' ends up as the name of the columns index, where it becomes confusing as well. We can simply remove that label.

dfp = dfp.rename(columns = {'level_0' : 'Day'})
dfp.columns = dfp.columns.rename(None)
dfp

   Day Month&Year  cut  pressure  rate
0    1   Dec 2023    7         6     8
1    1   Jan 2024    2        67    22
2    2   Dec 2023    8         8     9
3    2   Jan 2024   67         8     1
4    3   Dec 2023   77        11     1
5    3   Jan 2024    7         8     7
6   31   Dec 2023    8         3     0
7   31   Jan 2024    2         8     6

Sorting by date will be another question involving pandas.datetime.