So i have a data frame
data = {
'Type': ['Z', 'x', 'A', 'A', 'A', 'D', 'B', 'B','Z', 'x', 'A', 'A', 'A', 'D', 'B', 'B'],
'Val': ['012', '11', '123', '567', '101', '22', 'T54', '111','012', '11', '123', '500', '101', '22', 'T54', '111']
}
df = pd.DataFrame(data)
I simply want to pivot this data frame maintaining the original order of the columns
so output will look like this
Z | x | A | D | B |
---|---|---|---|---|
012 | 11 | 123 | 22 | T54 |
nan | nan | 567 | nan | 111 |
nan | nan | 101 | nan | nan |
012 | 11 | 123 | 22 | T54 |
nan | nan | 500 | nan | 111 |
nan | nan | 101 | nan | nan |
note that the order of the columns should be maintained and whichever columns are repeating its value will keep getting added to new rows.
I have tried this code but unfortunately not getting the desired output
df['Type'] = pd.Categorical(df['Type'], categories=df['Type'].unique(), ordered=True)
df['Row'] = df.groupby('Type').cumcount()
df = df.pivot(index='Row', columns='Type', values='Val')
df = df.reset_index(drop=True)
df.columns.name = None
You need to use a double index, once to determine when the values restart to the beginning (taking advantage of your Categorical and of shift
+cumsum
), and once with the groupby.cumcount
:
df['Type'] = pd.Categorical(df['Type'], categories=df['Type'].unique(),
ordered=True)
out = (df
.assign(idx=lambda d: d['Type'].lt(d['Type'].shift()).cumsum(),
idx2=lambda d: d.groupby(['idx', 'Type']).cumcount(),
)
.pivot(index=['idx', 'idx2'], columns='Type', values='Val')
)
Output:
Type Z x A D B
idx idx2
0 0 012 11 123 22 T54
1 NaN NaN 567 NaN 111
2 NaN NaN 101 NaN NaN
1 0 012 11 123 22 T54
1 NaN NaN 500 NaN 111
2 NaN NaN 101 NaN NaN