Search code examples
pandasmulti-index

Combine Pandas DataFrames while creating MultiIndex Columns


I have two DataFrames, something like this:

import pandas as pd

dates = pd.Index(['2016-10-03', '2016-10-04', '2016-10-05'], name='Date')

close = pd.DataFrame( {'AAPL': [112.52,  113., 113.05],
                       'CSCO': [  31.5, 31.35, 31.59 ],
                       'MSFT': [ 57.42, 57.24, 57.64 ] }, index = dates )

volume= pd.DataFrame( {'AAPL': [21701800, 29736800, 21453100] ,
                       'CSCO': [14070500, 18460400, 11808600] ,
                       'MSFT': [19189500, 20085900, 16726400] }, index = dates )

The output of DataFrame 'close' looks like this:

              AAPL   CSCO   MSFT
Date
2016-10-03  112.52  31.50  57.42
2016-10-04  113.00  31.35  57.24
2016-10-05  113.05  31.59  57.64

And the output of DataFrame 'volume' looks like this:

                AAPL      CSCO      MSFT
Date
2016-10-03  21701800  14070500  19189500
2016-10-04  29736800  18460400  20085900
2016-10-05  21453100  11808600  16726400

I would like to combine these two DataFrames into a single DataFrame with MultiIndex COLUMNS so that it looks like this:

              AAPL                CSCO                MSFT
             Close     Volume    Close     Volume    Close     Volume  
Date
2016-10-03  112.52   21701800    31.50   14070500    57.42   19189500
2016-10-04  113.00   29736800    31.35   18460400    57.24   20085900
2016-10-05  113.05   21453100    31.59   11808600    57.64   16726400

Can anyone give me an idea how to do that? I've been playing with pd.concat and pd.merge, but it's not clear to me how to get it to line up on the date index and allow me to provide names for the sub-index ('Close" and 'Volume') on the columns.


Solution

  • You can use the keys kwarg of concat:

    In [11]: res = pd.concat([close, volume], axis=1, keys=["close", "volume"])
    
    In [12]: res
    Out[12]:
                 close                  volume
                  AAPL   CSCO   MSFT      AAPL      CSCO      MSFT
    Date
    2016-10-03  112.52  31.50  57.42  21701800  14070500  19189500
    2016-10-04  113.00  31.35  57.24  29736800  18460400  20085900
    2016-10-05  113.05  31.59  57.64  21453100  11808600  16726400
    

    With a little rearrangement:

    In [13]: res.swaplevel(0, 1, axis=1).sort_index(axis=1)
    Out[13]:
                  AAPL             CSCO             MSFT
                 close    volume  close    volume  close    volume
    Date
    2016-10-03  112.52  21701800  31.50  14070500  57.42  19189500
    2016-10-04  113.00  29736800  31.35  18460400  57.24  20085900
    2016-10-05  113.05  21453100  31.59  11808600  57.64  16726400