Search code examples
pythonpandasdataframemulti-index

Creating multi-index using str.extract in pandas dataframe


I have a dataframe with columns and rows that have names as below index:

idx = ['CAN_agr', 'CAN_ser', 'USA_agr', 'USA_ser', 'MEX_agr', 'MEX_ser']
sample = pd.DataFrame(
[[1, 2, 3, 4, 5 ,6],
[7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]],
index = idx,
columns=idx
)

I wanted to use this names to create multi-index where level=0 will be name of countries (i.e. CAN, USA, MEX) and level=1 will have the industry type(i.e. agr, ser) for each country. I thought of using str.extract, but I was not sure how I could separate the labels according to countries and industry types and assign them as the multi-index columns and rows. Could anyone help me how I should approach this issue? Thank you very much!


Solution

  • split index and columns on _ and then assign them back as index and columns should do:

    sample.index = sample.index.str.split('_', expand=True)
    sample.columns = sample.columns.str.split('_', expand=True)
    sample
    
            CAN     USA     MEX    
            agr ser agr ser agr ser
    CAN agr   1   2   3   4   5   6
        ser   7   8   9  10  11  12
    USA agr  13  14  15  16  17  18
        ser  19  20  21  22  23  24
    MEX agr  25  26  27  28  29  30
        ser  31  32  33  34  35  36