Search code examples
pythonpivotreshapemelt

How do I convert time series data from wide to long format using python (pandas package)?


I have some data taken at different time points in wide format, and need to convert it to long format to aid with analysis and to merge it with another dataset.

The format of the data is (where A_0 means value of A at time 0, A_15 means value at time 15):

import pandas as pd

df_wide = pd.DataFrame({'Subject': ['AA', 'BB', 'CC', 'DD'],
               'A_0': [1, 2, 3, 4],
               'A_15': [2, 3, 4, 5],
               'A_30': [3, 4, 5, 6],
               'B_0': [1, 2, 3, 4],
               'B_15': [2, 3, 4, 5],
               'B_30': [3, 4, 5, 6],
               'C_0': [1, 2, 3, 4],
               'C_15': [2, 3, 4, 5],
               'C_30': [3, 4, 5, 6]
                  
              }
             )

df_wide
  Subject  A_0  A_15  A_30  B_0  B_15  B_30  C_0  C_15  C_30
0      AA    1     2     3    1     2     3    1     2     3
1      BB    2     3     4    2     3     4    2     3     4
2      CC    3     4     5    3     4     5    3     4     5
3      DD    4     5     6    4     5     6    4     5     6

I wish to convert this to long format and generate the variable time as follow:

df_long = pd.DataFrame({'Subject': ['AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'CC', 'CC', 'CC', 'DD', 'DD',       'DD'],
                    'Time': [0, 15, 30, 0, 15, 30, 0, 15, 30, 0, 15, 30],
                    'A': [1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6],
                    'B': [1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6],
                    'C': [1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6],       
                    })

df_long
   Subject  Time  A  B  C
0       AA     0  1  1  1
1       AA    15  2  2  2
2       AA    30  3  3  3
3       BB     0  2  2  2
4       BB    15  3  3  3
5       BB    30  4  4  4
6       CC     0  3  3  3
7       CC    15  4  4  4
8       CC    30  5  5  5
9       DD     0  4  4  4
10      DD    15  5  5  5
11      DD    30  6  6  6

I've read the pivot and melt functions but can't quite get my head around it - any assistance would be greatly appreciated.

Have tired pivot and melt but unsure how to generate the time variable.


Solution

  • You can also combine melt, split and pivot.

    df_long=pd.melt (df_wide, id_vars=["Subject"], var_name="Time")
    df_long[["Time_letter", "Time"]]=df_long["Time"].str.split("_", expand=True)
    df_final=pd.pivot(df_long, index=["Subject", "Time"], columns="Time_letter", values="value")
    

    Output (has multiindex - can be flattened if you want):

    Time_letter   A  B  C
    Subject Time
    AA      0     1  1  1
            15    2  2  2
            30    3  3  3
    BB      0     2  2  2
            15    3  3  3
            30    4  4  4
    CC      0     3  3  3
            15    4  4  4
            30    5  5  5
    DD      0     4  4  4
            15    5  5  5
            30    6  6  6