Search code examples
pandasmulti-index

Method to set new Multiindex columns from different dataframe


Given a DataFrame (d) with MultiIndex columns, I would like to set another DataFrame (d2) as one of the 'multicolumns', such that the top level has some label, and the second level labels match those of the original:

nr.seed(0)
abc = ['a', 'b', 'c']
mi = pd.MultiIndex.from_product([['A'], abc])
d = DataFrame(np.random.randint(0, 10, (4, 3)), columns=mi)
d
   A      
   a  b  c
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6

d2 = DataFrame(np.random.randint(0, 10, (4, 3)), columns=abc)
d2
   a  b  c
0  8  8  1
1  6  7  7
2  8  1  5
3  9  8  9

If possible, I would like to join them using a single builtin method that accomplishes the following forloop:

for c2 in d2:
    d['B', c2] = d2[c2]
d
   A        B      
   a  b  c  a  b  c
0  5  0  3  8  8  1
1  3  7  9  6  7  7
2  3  5  2  8  1  5
3  4  7  6  9  8  9

For a DataFrame with a single-level column:

d3 = d.copy()
d3.columns = d3.columns.droplevel(0)
d3 = d3.rename(columns=dict(zip('abc', 'def')))
d3
   d  e  f
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6

I can do the following:

d3[d2.columns] = d2
d3
   d  e  f  a  b  c
0  5  0  3  8  8  1
1  3  7  9  6  7  7
2  3  5  2  8  1  5
3  4  7  6  9  8  9

But when I try this with the MultiIndexed DataFrame, I get errors:

d['B', tuple(d2.columns)] = d2
=> ValueError: Wrong number of items passed 3, placement implies 1
d['B'][tuple(d2.columns)] = d2
=> KeyError: 'B'

Is there a builtin method to do this? (Basically do this for multiple columns at once).


Solution

  • UPDATE:

    def add_multicolumn(df, df2, new_col_name):
        tmp = df2.copy()    # make copy, otherwise df2 will be changed !!!
        tmp.columns = pd.MultiIndex.from_product([[new_col_name], df2.columns.tolist()])
        return pd.concat([df, tmp], axis=1)
    

    assuming that we have the following DF and we want to add a third 'multicolumn' - C:

    In [114]: d
    Out[114]:
       A        B
       a  b  c  a  b  c
    0  5  5  7  0  7  2
    1  5  3  9  0  5  5
    2  5  8  5  5  5  7
    3  5  4  5  4  5  2
    

    using our function:

    In [132]: add_multicolumn(d, d2, 'C')
    Out[132]:
       A        B        C
       a  b  c  a  b  c  a  b  c
    0  5  5  7  0  7  2  0  7  2
    1  5  3  9  0  5  5  0  5  5
    2  5  8  5  5  5  7  5  5  7
    3  5  4  5  4  5  2  4  5  2
    

    OLD answer:

    you can do it using pd.concat():

    In [35]: d = pd.concat({'A':d['A'], 'B':d2}, axis=1)
    
    In [36]: d
    Out[36]:
       A        B
       a  b  c  a  b  c
    0  7  3  9  0  7  2
    1  9  4  5  0  5  5
    2  7  6  1  5  5  7
    3  2  5  7  4  5  2
    

    Explanation:

    In [37]: d['A']
    Out[37]:
       a  b  c
    0  7  3  9
    1  9  4  5
    2  7  6  1
    3  2  5  7
    
    In [40]: pd.concat({'A':d['A'], 'B':d2}, axis=1)
    Out[40]:
       A        B
       a  b  c  a  b  c
    0  5  5  7  0  7  2
    1  5  3  9  0  5  5
    2  5  8  5  5  5  7
    3  5  4  5  4  5  2