Search code examples
pandasdataframemulti-index

How to add a level to a dataframe in pandas where the new level comes from a column of another dataframe?


I have a pandas dataframe that I have named as data_df, Which looks like this.

        A             B             C               D
    a1 a2  a3     b1 b2 b3      c1 c2 c3        d1 d2 d3
1   1   2   3     5   6  7      3  4   5        10  1  8
2   0   5   4     8   7  9      1  5   7        8   6  1

I have another dataframe named universe_df which looks like this

Index Symbol Category
1       A       X
2       B       Y
3       C       Y
4       D       Z

I want the output to look like this:

       X                     Y                    Z
       A               B           C              D
    a1 a2  a3       b1 b2 b3    c1 c2 c3       d1 d2 d3
1   1  2   3         5  6  7     3 4   5        10 1  8
2   0  5   4         8  7  9     1  5  7         8 6  1


How should I go about it? Thanks in advance.


Solution

  • Sample data:

    data_df = pd.DataFrame({('A', 'a1'): {1: 1, 2: 0}, ('A', 'a2'): {1: 2, 2: 5}, ('A', 'a3'): {1: 3, 2: 4}, ('B', 'b1'): {1: 5, 2: 8}, ('B', 'b2'): {1: 6, 2: 7}, ('B', 'b3'): {1: 7, 2: 9}, ('C', 'c1'): {1: 3, 2: 1}, ('C', 'c2'): {1: 4, 2: 5}, ('C', 'c3'): {1: 5, 2: 7}, ('D', 'd1'): {1: 10, 2: 8}, ('D', 'd2'): {1: 1, 2: 6}, ('D', 'd3'): {1: 8, 2: 1}})
    print (data_df)
       A        B        C         D      
      a1 a2 a3 b1 b2 b3 c1 c2 c3  d1 d2 d3
    1  1  2  3  5  6  7  3  4  5  10  1  8
    2  0  5  4  8  7  9  1  5  7   8  6  1
    
    
    universe_df = pd.DataFrame({'Symbol': {1: 'A', 2: 'B', 3: 'C', 4: 'D'}, 'Category': {1: 'X', 2: 'Y', 3: 'Y', 4: 'Z'}})
    print (universe_df)
      Symbol Category
    1      A        X
    2      B        Y
    3      C        Y
    4      D        Z
    

    First check if same ouput, no trailing whitespaces:

    print (universe_df)
    MultiIndex([('A', 'a1'),
                ('A', 'a2'),
                ('A', 'a3'),
                ('B', 'b1'),
                ('B', 'b2'),
                ('B', 'b3'),
                ('C', 'c1'),
                ('C', 'c2'),
                ('C', 'c3'),
                ('D', 'd1'),
                ('D', 'd2'),
                ('D', 'd3')],
               )
    
    print (universe_df['Symbol'].tolist())
    ['A', 'B', 'C', 'D']
    

    Create Series for mapping first level in list comprehension and recreate MultiIndex:

    d = universe_df.set_index('Symbol')['Category']
    
    data_df.columns = pd.MultiIndex.from_tuples([(d.get(a), a, b) for a, b in data_df.columns])
    print (data_df)
       X        Y                  Z      
       A        B        C         D      
      a1 a2 a3 b1 b2 b3 c1 c2 c3  d1 d2 d3
    1  1  2  3  5  6  7  3  4  5  10  1  8
    2  0  5  4  8  7  9  1  5  7   8  6  1
    

    Or you can mapping first level of MultiIndex:

    d = universe_df.set_index('Symbol')['Category']
    
    a = data_df.columns.get_level_values(0)
    b = data_df.columns.get_level_values(1)
    
    data_df.columns = [a.map(d),a, b]
    

    Another idea with helper DataFrame:

    d = universe_df.set_index('Symbol')['Category']
    
    df = data_df.columns.to_frame()
    df.insert(0, '', df[0].map(d))
    
    data_df.columns = pd.MultiIndex.from_frame(df)
    print (data_df)
       X        Y                  Z      
    0  A        B        C         D      
    1 a1 a2 a3 b1 b2 b3 c1 c2 c3  d1 d2 d3
    1  1  2  3  5  6  7  3  4  5  10  1  8
    2  0  5  4  8  7  9  1  5  7   8  6  1