Search code examples
pythonpandasgroup-bytime-seriespivot-table

Reshape dataframe into a timeseries when time information is split into columns and index?


The dataframe below tracks daily changes in parameters "rate", "pressure", and "cut". The date information is split between a column 'Months&year' (Jan 2024, Dec 2023, etc...) on one side, while days (1-31) make the index of the dataframe:

dataframe.head()

Month&Year  parameter   1   2   3   ..  31
Jan 2024    rate       22   1   7       6
Jan 2024    pressure    67  8   8       8
Jan 2024    cut         2   67  7       2
Dec 2023    rate        8   9   1       0
Dec 2023    pressure    6   8   11      3
Dec 2023    cut         7   8   77      8

I seek to gather the date information ("day", "month&year") on one side so as to reshape the dataframe into a timeseries, this way:

Month&Year  parameter   rate    pressure    cut
Jan 2024    1   22  67  2
Jan 2024    2   1   8   67
Jan 2024    3   7   8   7
..          
Jan 2024    31  6   8   2
    
Dec 2023    1   8   6   7
Dec 2023    2   9   8   8
Dec 2023    3   1   11  77
..          
Dec 2023    31  0   3   8

I guess I will need to split this dataframe to multiple datafames using groupby for month&year, transpose each of new datasets, and then merge them again.

Here is what I have achieved so far, when I used groupby and splited this dataframe to separate dataframes:

in:
    mo=dataframe.groupby('Month&year')
    dataframe_months=[mo.get_group(x) for x in mo.groups]
    print ("result: \"n", dataframe_months)


out:

    Month&Year  parameter   1   2   3   ..  31
    Jan 2024    rate    22  1   7       6
    Jan 2024    pressure    67  8   8       8
    Jan 2024    cut 2   67  7       2
                                                
                        
    Month&Year  parameter   1   2   3   ..  31
    Dec 2023    rate    8   9   1       0
    Dec 2023    pressure    6   8   11      3
    Dec 2023    cut 7   8   77      8

Could you please suggest how can I refer to splited dataframes without writing manually their names (there are many years) to transpose them and merge in the same order as before.


Solution

  • Your proposed strategy looks unnecessarily complicated for what is a mere reshaping of your dataframe. The pandas user guide recommends a pivot_table:

    Repeating input data for clarity
    (skipping undisclosed days)

    df = pd.DataFrame({'Month&Year': ['Jan 2024','Jan 2024','Jan 2024','Dec 2023','Dec 2023','Dec 2023'],
                       'parameter': ['rate', 'pressure', 'cut', 'rate', 'pressure', 'cut'],
                       1: [22, 67, 2, 8, 6, 7],
                       2: [1, 8, 67, 9, 8, 8],
                       3: [7, 8, 7, 1, 11, 77],
                       31: [6, 8, 2, 0, 3, 8]})
    

    Copy-pasting the output of print(df):

      Month&Year parameter   1   2   3  31
    0   Jan 2024      rate  22   1   7   6
    1   Jan 2024  pressure  67   8   8   8
    2   Jan 2024       cut   2  67   7   2
    3   Dec 2023      rate   8   9   1   0
    4   Dec 2023  pressure   6   8  11   3
    5   Dec 2023       cut   7   8  77   8
    

    1. Pivot 'Month&Year' and the days (1 to 31) to the same side
    Here we swap 'Month&Year' up as another level of columns using pivot_table, then transpose (T) and finally flatten the table (reset_index).

    dfp = pd.pivot_table(data    = df,
                         values  = [1,2,3,31],
                         columns = 'Month&Year',
                         index   = 'parameter'
                        ).T.reset_index()
    
    parameter  level_0 Month&Year  cut  pressure  rate
    0                1   Dec 2023    7         6     8
    1                1   Jan 2024    2        67    22
    2                2   Dec 2023    8         8     9
    3                2   Jan 2024   67         8     1
    4                3   Dec 2023   77        11     1
    5                3   Jan 2024    7         8     7
    6               31   Dec 2023    8         3     0
    7               31   Jan 2024    2         8     6
    

    2. Finishing touch for readability of index and columns

    • Your desired output dataframe showed 'parameter' as name for the column holding the days. This felt confusing so I named it 'Day' instead.
    • As a result of the pivot, the label 'parameter' ends up as the name of the columns index, where it becomes confusing as well. We can simply remove that label.
    dfp = dfp.rename(columns = {'level_0' : 'Day'})
    dfp.columns = dfp.columns.rename(None)
    dfp
    
       Day Month&Year  cut  pressure  rate
    0    1   Dec 2023    7         6     8
    1    1   Jan 2024    2        67    22
    2    2   Dec 2023    8         8     9
    3    2   Jan 2024   67         8     1
    4    3   Dec 2023   77        11     1
    5    3   Jan 2024    7         8     7
    6   31   Dec 2023    8         3     0
    7   31   Jan 2024    2         8     6
    

    Sorting by date will be another question involving pandas.datetime.