Search code examples
pythonpandasdataframemulti-index

Why transpose data to get a multiindexed dataframe?


I'm a bit confused with data orientation when creating a Multiindexed DataFrame from a DataFrame. I import data with read_excel() and I begin with something like:

import pandas as pd
df = pd.DataFrame([['A', 'B', 'A', 'B'], [1, 2, 3, 4]],
columns=['k', 'k', 'm', 'm'])
df

Out[3]: 
   k  k  m  m
0  A  B  A  B
1  1  2  3  4

I want to multiindex this and to obtain:

   A  B  A  B
   k  k  m  m
0  1  2  3  4

Mainly from Pandas' doc, I did:

arrays = df.iloc[0].tolist(), list(df)
tuples = list(zip(*arrays))
multiindex = pd.MultiIndex.from_tuples(tuples, names=['topLevel', 'downLevel'])
df = df.drop(0)

If I try

df2 = pd.DataFrame(df.values, index=multiindex)
(...)
ValueError: Shape of passed values is (4, 1), indices imply (4, 4)

I then have to transpose the values:

df2 = pd.DataFrame(df.values.T, index=multiindex)
df2
Out[11]: 
                    0
topLevel downLevel   
A        k          1
B        k          2
A        m          3
B        m          4

Last I re-transpose this dataframe to obtain:

df2.T
Out[12]: 
topLevel   A  B  A  B
downLevel  k  k  m  m
0          1  2  3  4

OK, this is what I want, but I don't understand why I have to transpose 2 times. It seems useless.


Solution

  • You can create the MultiIndex yourself, and then drop the row. From your starting df:

    import pandas as pd
    
    df.columns = pd.MultiIndex.from_arrays([df.iloc[0], df.columns], names=[None]*2)
    df = df.iloc[1:].reset_index(drop=True)
    
       A  B  A  B
       k  k  m  m
    0  1  2  3  4