Search code examples
pythonpandasdataframemulti-indexmulti-level

How to add multilevel column name to specific column only(not all the columns) in python pandas.DataFrame?


Refer here for the question background. I want to add C to only column B.

I need output as:

 df
    Out[92]: 
       A  B
          C
    a  0  0
    b  1  1
    c  2  2
    d  3  3
    e  4  4

I tried this example as :

dfnew=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

columns=[('c','b')]  #changed from columns=[('c','a'),('c','b')]

dfnew.columns=pd.MultiIndex.from_tuples(columns)

But that doesn't works. ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements


Solution

  • You can use MultiIndex.from_arrays:

    df.columns = pd.MultiIndex.from_arrays([df.columns, ['','C']])
    
       A  B
          C
    a  0  0
    b  1  1
    c  2  2
    d  3  3
    e  4  4
    

    Note that pd.MultiIndex.from_tuples is expecting a list of tuples, as the name suggests. If you check the source code, you'll see that if that is not the case, it will create one from the nested list by zipping it:

    list(zip(*[df.columns, ['','C']]))
    # [('A', ''), ('B', 'C')]
    

    Which is the reason why you don't get what you expect.


    If you want to do the same by specifying a list of columns, you could do:

    cols = [(i, 'C') if i in ['B','D'] else (i, '') for i in df.columns]
    # [('A', ''), ('B', 'C'), ('C', ''), ('D', 'C')]
    df.columns = pd.MultiIndex.from_tuples(cols)
    
       A  B  C  D
          C     C
    a  0  0  0  0
    b  1  1  1  1
    c  2  2  2  2
    d  3  3  3  3
    e  4  4  4  4