Search code examples
pythonpandasdataframemultiple-columns

How to add columns with the same name into last line in order to have only one column pandas


There is a dataframe that contains several columns with the same name and I don't know the exact number of columns with the same. I want to keep only one column on my dataframe and add the the values of the other to the last line.

I don't know the best way to proceed.

Simple example below.

dataframe:

col_1 col_1 col_1 ... col_2 col_2 col_2
'abc' 'def' 'ghi' ... 123      456   789

expected output:

col_1 col_2
'abc'   123  
'def'   456
'ghi'   789

Solution

  • You can use:

    (df.melt()
       .assign(idx=lambda d: d.groupby('variable').cumcount())
       .pivot(index='idx', columns='variable', values='value')
       .reset_index(drop=True).rename_axis(columns=None)
    )
    

    Or:

    (df.stack().to_frame()
       .pipe(lambda s: s.set_index(s.groupby(level=1).cumcount(), append=True))
       .unstack(level=1)[0].droplevel(0)
    )
    

    Output:

      col_1 col_2
    0   abc   123
    1   def   456
    2   ghi   789