There is a dataframe that contains several columns with the same name and I don't know the exact number of columns with the same. I want to keep only one column on my dataframe and add the the values of the other to the last line.
I don't know the best way to proceed.
Simple example below.
dataframe:
col_1 col_1 col_1 ... col_2 col_2 col_2
'abc' 'def' 'ghi' ... 123 456 789
expected output:
col_1 col_2
'abc' 123
'def' 456
'ghi' 789
You can use:
(df.melt()
.assign(idx=lambda d: d.groupby('variable').cumcount())
.pivot(index='idx', columns='variable', values='value')
.reset_index(drop=True).rename_axis(columns=None)
)
Or:
(df.stack().to_frame()
.pipe(lambda s: s.set_index(s.groupby(level=1).cumcount(), append=True))
.unstack(level=1)[0].droplevel(0)
)
Output:
col_1 col_2
0 abc 123
1 def 456
2 ghi 789