Search code examples
pythonpandasdataframemulti-index

How to create hiearchical multi-index dataframe from multiple dataframes?


I have multiple dataframes with identical shape, say:

d1 = {'time': [1,2,3,4], 'A': [55.5,55.5,55.5,55.5], 'B':[55.5,55.5,55.5,55.5]}
d2 = {'time': [1,2,3,4], 'A': [7,6,5,4], 'B':[9,8,7,6]}
d3 = {'time': [1,2,3,4], 'A': [1,2,3,4], 'B':[2,3,4,5]}

df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)
df3 = pd.DataFrame(data = d3)

They will always have the same column names and the entries in the "time" column will always be identical. I want to merge them in a way, that I get a hierarchical index, i.e.

index  frame     'time'    'A'       'B'
0      1         1         55.5      55.5
       2         1         7         9
       3         1         1         2
1      1         2         55.5      55.5
       2         2         6         8
       3         2         2         3
...

I have already experimented with .merge() and .join() and .concatenate() with various options, but I seem unable to wrap my head around it. What is the most elegant way to achieve this?


Solution

  • IIUC, you can concat and sort the index:

    dfs = [df1, df2, df3]
    
    (pd
     .concat(dict(enumerate(dfs, start=1)), names=['frame', 'index'])
     .sort_index(level='index')
     .swaplevel()
    )
    

    output:

                 time     A     B
    index frame                  
    0     1         1  55.5  55.5
          2         1   7.0   9.0
          3         1   1.0   2.0
    1     1         2  55.5  55.5
          2         2   6.0   8.0
          3         2   2.0   3.0
    2     1         3  55.5  55.5
          2         3   5.0   7.0
          3         3   3.0   4.0
    3     1         4  55.5  55.5
          2         4   4.0   6.0
          3         4   4.0   5.0