Search code examples
pythonpandasdataframeduplicatesreindex

pandas DataFrame reset_index which can handle duplicate column names?


Is there any equivalent of pandas.DataFrame.reset_index() which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like df.rename or df.reindex_axis do not work when I have duplicate column names.)

Sample input:

 pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])

     A   A   B
0   0.5 0.3 0.9
1   0.7 0.9 0.3
2   0.9 0.4 0.8
3   0.6 0.2 0.9
4   0.7 0.4 0.6

Expected output:

     0   1   2
0   0.8 0.1 0.2
1   0.4 0.2 0.4
2   0.3 0.3 0.4
3   0.4 0.1 0.8
4   1.0 0.9 0.9

Solution

  • you can use set_axis() method:

    In [54]: df
    Out[54]:
              A         A         B
    0  0.934900  0.817182  0.166270
    1  0.064543  0.139431  0.249576
    2  0.709349  0.731913  0.965048
    3  0.284955  0.479898  0.496652
    4  0.520749  0.464256  0.999993
    
    In [55]: df.set_axis(1, range(len(df.columns)))
    
    In [56]: df
    Out[56]:
              0         1         2
    0  0.934900  0.817182  0.166270
    1  0.064543  0.139431  0.249576
    2  0.709349  0.731913  0.965048
    3  0.284955  0.479898  0.496652
    4  0.520749  0.464256  0.999993