Search code examples
pandasdataframepivotpandas-melt

Is there an easy way to handle wide timeseries data with pandas


I have a wide dataframe that looks like this.

    Hour  Wed Aug 10 2022  Thu Aug 11 2022  Fri Aug 12 2022  Sat Aug 13 2022
0      1            52602            49281            52805            53069   
1      2            49970            46938            50135            50591   
2      3            48188            45494            48156            48837   
3      4            47046            44611            47162            47220   
4      5            46746            44375            46742            46300   
5      6            47493            45325            47259            46115   
6      7            48923            47073            48598            46100   
7      8            49568            47857            49208            46406   
8      9            52147            49854            51482            49274   
9     10            55879            53215            55066            53303   
10    11            60309            57576            59480            57773   
11    12            64943            62024            63799            61670   
12    13            68988            66202            67331            64791   
13    14            72274            69657            69557            67590   
14    15            74249            71855            71525            69363   
15    16            75062            73585            72573            70173   
16    17            74197            74163            72692            70607   
17    18            71764            73506            71726            70353   
18    19            68248            71413            69588            69105   
19    20            63552            68774            67319            66704   
20    21            61337            66328            64784            64501   
21    22            59275            63760            62836            62415   
22    23            55960            60090            59766            59115   
23    24            52384            56233            56341            55681  

I would like it to look like this:

Date    Hour    Values

8/10/2022   1   52602

8/10/2022   2   49970

8/10/2022   3   48188

8/10/2022   4   47046

8/10/2022   5   46746

8/10/2022   6   47493

8/10/2022   7   48923

8/10/2022   8   49568

8/10/2022   9   52147

8/10/2022   10  55879

8/10/2022   11  60309

8/10/2022   12  64943

8/10/2022   13  68988

8/10/2022   14  72274

8/10/2022   15  74249

8/10/2022   16  75062

8/10/2022   17  74197

8/10/2022   18  71764

8/10/2022   19  68248

8/10/2022   20  63552

8/10/2022   21  61337

8/10/2022   22  59275

8/10/2022   23  55960

8/10/2022   24  52384

8/11/2022   1   49281

8/11/2022   2   46938

8/11/2022   3   45494

8/11/2022   4   44611

8/11/2022   5   44375

8/11/2022   6   45325

8/11/2022   7   47073

8/11/2022   8   47857

8/11/2022   9   49854

8/11/2022   10  53215

8/11/2022   11  57576

8/11/2022   12  62024

8/11/2022   13  66202

8/11/2022   14  69657

8/11/2022   15  71855

8/11/2022   16  73585

8/11/2022   17  74163

8/11/2022   18  73506

8/11/2022   19  71413

8/11/2022   20  68774

8/11/2022   21  66328

8/11/2022   22  63760

8/11/2022   23  60090

8/11/2022   24  56233

I have tried melt and wide_to_long, but can't get this exact output. Can someone please point me in the right direction?

I'm sorry I can't even figure out how to demonstrate the output I want correctly.


Solution

  • # Melt your data, keeping the `Hour` column.
    df = df.melt('Hour', var_name='Date', value_name='Values')
    # Convert to Datetime.
    df['Date'] = pd.to_datetime(df['Date'])
    # Reorder columns as desired.
    df = df[['Date', 'Hour', 'Values']]
    print(df)
    

    Output:

             Date  Hour  Values
    0  2022-08-10     1   52602
    1  2022-08-10     2   49970
    2  2022-08-10     3   48188
    3  2022-08-10     4   47046
    4  2022-08-10     5   46746
    ..        ...   ...     ...
    91 2022-08-13    20   66704
    92 2022-08-13    21   64501
    93 2022-08-13    22   62415
    94 2022-08-13    23   59115
    95 2022-08-13    24   55681
    
    [96 rows x 3 columns]
    

    Additionally, I'd consider merging your Date and Hour columns into a proper timestamp:

                                  # it's unclear if Hour 1 is 1am or midnight~
    df['timestamp'] = df.Date.add(pd.to_timedelta(df.Hour.sub(1), unit='h'))
    print(df[['timestamp', 'Values']])
    
    # Output:
    
                 timestamp  Values
    0  2022-08-10 00:00:00   52602
    1  2022-08-10 01:00:00   49970
    2  2022-08-10 02:00:00   48188
    3  2022-08-10 03:00:00   47046
    4  2022-08-10 04:00:00   46746
    ..                 ...     ...
    91 2022-08-13 19:00:00   66704
    92 2022-08-13 20:00:00   64501
    93 2022-08-13 21:00:00   62415
    94 2022-08-13 22:00:00   59115
    95 2022-08-13 23:00:00   55681
    
    [96 rows x 2 columns]