Search code examples
pythonpandasrenamemultiple-columnsmulti-index

Pandas: Is there a way to use something like 'droplevel' and in process, rename the other level using the dropped level labels as prefix/suffix?


Screenshot of the query below:

Groupby Query

Is there a way to easily drop the upper level column index and a have a single level with labels such as points_prev_amax, points_prev_amin, gf_prev_amax, gf_prev_amin and so on?


Solution

  • Use list comprehension for set new column names:

    df.columns = df.columns.map('_'.join)
    
    Or:
    
    df.columns = ['_'.join(col) for col in df.columns]
    

    Sample:

    df = pd.DataFrame({'A':[1,2,2,1],
                       'B':[4,5,6,4],
                       'C':[7,8,9,1],
                       'D':[1,3,5,9]})
    
    print (df)
       A  B  C  D
    0  1  4  7  1
    1  2  5  8  3
    2  2  6  9  5
    3  1  4  1  9
    
    df = df.groupby('A').agg([max, min])
    
    df.columns = df.columns.map('_'.join)
    print (df)
       B_max  B_min  C_max  C_min  D_max  D_min
    A                                          
    1      4      4      7      1      9      1
    2      6      5      9      8      5      3
    

    print (['_'.join(col) for col in df.columns])
    ['B_max', 'B_min', 'C_max', 'C_min', 'D_max', 'D_min']
    
    df.columns = ['_'.join(col) for col in df.columns]
    print (df)
       B_max  B_min  C_max  C_min  D_max  D_min
    A                                          
    1      4      4      7      1      9      1
    2      6      5      9      8      5      3
    

    If need prefix simple swap items of tuples:

    df.columns = ['_'.join((col[1], col[0])) for col in df.columns]
    print (df)
       max_B  min_B  max_C  min_C  max_D  min_D
    A                                          
    1      4      4      7      1      9      1
    2      6      5      9      8      5      3
    

    Another solution:

    df.columns = ['{}_{}'.format(i[1], i[0]) for i in df.columns]
    print (df)
       max_B  min_B  max_C  min_C  max_D  min_D
    A                                          
    1      4      4      7      1      9      1
    2      6      5      9      8      5      3
    

    If len of columns is big (10^6), then rather use to_series and str.join:

    df.columns = df.columns.to_series().str.join('_')