Search code examples
pythonpandasdataframetimestampdata-processing

Is there a way to append values of the column with same column name using python?


I have a dataset with some of the column having identical column name. I want to merge column with same column name such that the values are appended as rows. And, for the columns that don't have column with same column name, 0 is appended in rows.

I have tried melting, but it doesn't seems to work for the format i need.

Sample Data:

print (df)
       Date  Column_A  Column_A  Column_B
0  1/2/2018         3         2         3
1  2/2/2018         4         7         1
2  3/2/2018         2         2         6
3  4/2/2018         1         1         4    

Expected output:

       Date  Column_A  Column_B
0  1/2/2018         3       3.0
1  2/2/2018         4       1.0
2  3/2/2018         2       6.0
3  4/2/2018         1       4.0
4  1/2/2018         2       0.0
5  2/2/2018         7       0.0
6  3/2/2018         2       0.0
7  4/2/2018         1       0.0

Solution

  • Idea is create MultiIndex in columns with GroupBy.cumcount, then reshape by DataFrame.stack, sorting by second level of MultiIndex by DataFrame.sort_index and last remove second level with convert first level to column Date by double DataFrame.reset_index:

    df = df.set_index('Date')
    s = df.columns.to_series()
    
    df.columns = [df.columns, s.groupby(s).cumcount()]
    df = df.stack().sort_index(level=1).fillna(0).reset_index(level=1, drop=True).reset_index()
    print (df)
           Date  Column_A  Column_B
    0  1/2/2018         3       3.0
    1  2/2/2018         4       1.0
    2  3/2/2018         2       6.0
    3  4/2/2018         1       4.0
    4  1/2/2018         2       0.0
    5  2/2/2018         7       0.0
    6  3/2/2018         2       0.0
    7  4/2/2018         1       0.0