Search code examples
pythonpandasmulti-index

Simple way to create multiindex columns with pandas


I am sorry for asking but I did not get the still existing answers. I simply glued two data frames with the same column names.

|    |   X |   Y |   X |   Y |
|---:|----:|----:|----:|----:|
|  0 |   1 |   3 |   9 |   7 |
|  1 |   2 |   4 |   8 |   6 |

What I want is

|    |    FOO    |    BAR    |
|    |   X |   Y |   X |   Y |
|---:|----:|----:|----:|----:|
|  0 |   1 |   3 |   9 |   7 |
|  1 |   2 |   4 |   8 |   6 |

I tried pd.MultiIndex.from_product([c.columns, ['FOO', 'BAR']]) but this results in

MultiIndex([('X', 'FOO'),
            ('X', 'BAR'),
            ('Y', 'FOO'),
            ('Y', 'BAR'),
            ('X', 'FOO'),
            ('X', 'BAR'),
            ('Y', 'FOO'),
            ('Y', 'BAR')],
           )

But I need

MultiIndex([('X', 'FOO'),
            ('Y', 'FOO'),
            ('X', 'BAR'),
            ('Y', 'BAR')],
           )

This is an MWE

#!/usr/bin/env python3
import pandas as pd

a = pd.DataFrame({'X': [1,2], 'Y': [3, 4]})
b = pd.DataFrame({'X': [9,8], 'Y': [7, 6]})

c = pd.concat([a, b], axis=1)

# throws a ValueError: Length mismatch: Expected axis has 4 elements, new values have 8 elements
c.columns = pd.MultiIndex.from_product([c.columns, ['FOO', 'BAR']])

Would it help to do something to the two separate DataFrames before I concat() them?


Solution

  • You could simply add the extra level using numpy.repeat and pandas.MultiIndex:

    import numpy as np
    
    extra = ['FOO', 'BAR']
    c.columns = pd.MultiIndex.from_arrays([np.repeat(extra, len(c.columns)//len(extra)),
                                           c.columns])
    

    output:

      FOO    BAR   
        X  Y   X  Y
    0   1  3   9  7
    1   2  4   8  6
    

    NB. If the columns are shuffled, sort the columns and use np.tile instead:

    c = c.sort_index(axis=1)
    
    extra = ['FOO', 'BAR']
    c.columns = pd.MultiIndex.from_arrays([np.tile(extra, len(c.columns)//len(extra)),
                                           c.columns])
    

    output:

      FOO BAR FOO BAR
        X   X   Y   Y
    0   1   9   3   7
    1   2   8   4   6