Search code examples
pythonpandasdataframemulti-index

Split a multi-index dataframe in dataframes by column names


I have a dataframe like the following: Multi-index dataframe by columns

I would like to get 3 dataframes named like each columns (compass, accel, gyro) with the timeindex untouched, and three columns each(df1, df2, df3).

I've tried for index,row in df.iterrows(): but couldnt really got it to work And I was thinking in somenthing stack() and unstack() but don't really know how.


Solution

  • groupby allows you to split the DataFrame along a MultiIndex level with the same level_values. We will use DataFrame.xs to remove the grouping Index level, leaving you with only the columns you care about. Separate DataFrames are stored in a dictionary, keyed by the unique level-1 values of the original column MultiIndex.

    Sample Data

    import pandas as pd
    import numpy as np
    np.random.seed(123)
    df = pd.DataFrame(np.random.randint(1, 10, (4, 9)),
                      columns=pd.MultiIndex.from_product([['df1', 'df2', 'df3'],
                                                          ['compass', 'gyro', 'accel']]))
    #      df1                df2                df3           
    #  compass gyro accel compass gyro accel compass gyro accel
    #0       3    3     7       2    4     7       2    1     2
    #1       1    1     4       5    1     1       5    2     8
    #2       4    3     5       8    3     5       9    1     8
    #3       4    5     7       2    6     7       3    2     9
    

    Code

    d = {idx: gp.xs(idx, level=1, axis=1) for idx,gp in df.groupby(level=1, axis=1)}
    d['gyro']
    #   df1  df2  df3
    #0    3    4    1
    #1    1    1    2
    #2    3    3    1
    #3    5    6    2
    

    As such splits are readily available with a groupby you may not even need to store the separate DataFrames; you can manipulate each of them separately with GroupBy.apply.