I have some data taken at different time points in wide format, and need to convert it to long format to aid with analysis and to merge it with another dataset.
The format of the data is (where A_0 means value of A at time 0, A_15 means value at time 15):
import pandas as pd
df_wide = pd.DataFrame({'Subject': ['AA', 'BB', 'CC', 'DD'],
'A_0': [1, 2, 3, 4],
'A_15': [2, 3, 4, 5],
'A_30': [3, 4, 5, 6],
'B_0': [1, 2, 3, 4],
'B_15': [2, 3, 4, 5],
'B_30': [3, 4, 5, 6],
'C_0': [1, 2, 3, 4],
'C_15': [2, 3, 4, 5],
'C_30': [3, 4, 5, 6]
}
)
df_wide
Subject A_0 A_15 A_30 B_0 B_15 B_30 C_0 C_15 C_30
0 AA 1 2 3 1 2 3 1 2 3
1 BB 2 3 4 2 3 4 2 3 4
2 CC 3 4 5 3 4 5 3 4 5
3 DD 4 5 6 4 5 6 4 5 6
I wish to convert this to long format and generate the variable time as follow:
df_long = pd.DataFrame({'Subject': ['AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'CC', 'CC', 'CC', 'DD', 'DD', 'DD'],
'Time': [0, 15, 30, 0, 15, 30, 0, 15, 30, 0, 15, 30],
'A': [1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6],
'B': [1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6],
'C': [1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6],
})
df_long
Subject Time A B C
0 AA 0 1 1 1
1 AA 15 2 2 2
2 AA 30 3 3 3
3 BB 0 2 2 2
4 BB 15 3 3 3
5 BB 30 4 4 4
6 CC 0 3 3 3
7 CC 15 4 4 4
8 CC 30 5 5 5
9 DD 0 4 4 4
10 DD 15 5 5 5
11 DD 30 6 6 6
I've read the pivot and melt functions but can't quite get my head around it - any assistance would be greatly appreciated.
Have tired pivot and melt but unsure how to generate the time variable.
You can also combine melt, split and pivot.
df_long=pd.melt (df_wide, id_vars=["Subject"], var_name="Time")
df_long[["Time_letter", "Time"]]=df_long["Time"].str.split("_", expand=True)
df_final=pd.pivot(df_long, index=["Subject", "Time"], columns="Time_letter", values="value")
Output (has multiindex - can be flattened if you want):
Time_letter A B C
Subject Time
AA 0 1 1 1
15 2 2 2
30 3 3 3
BB 0 2 2 2
15 3 3 3
30 4 4 4
CC 0 3 3 3
15 4 4 4
30 5 5 5
DD 0 4 4 4
15 5 5 5
30 6 6 6