Search code examples
pythonpandaspandas-datareader

Concat two DataFrames on Pandas without Symbol, only Dates (from pd.datareader)


I've imported stock market data using pandas_datareader:

from pandas_datareader import data, wb
import pandas as pd
import datetime
start = datetime.datetime(2006, 1,1)
end = datetime.datetime(2016, 1,1)
boaml = data.DataReader('BAC', 'morningstar', start, end)
citi = data.DataReader('C', 'morningstar', start, end)

The data looks neat as represented by the result of citi.head():

Close   High    Low     Open    Volume

Symbol  Date    

C   

2006-01-02  485.3   487.1   482.2   483.5   0

2006-01-03  492.9   493.8   481.1   490.0   1536700

2006-01-04  483.8   491.0   483.5   488.6   1852790

2006-01-05  486.2   487.8   484.0   484.4   1015470

2006-01-06  486.2   489.0   482.0   488.8   1358930

Now, when I try to concatenate them using pd.concat(), I get NaN on the upper right corner, and in the lower left corner of the matrix:

bank_stocks = pd.concat([boaml, citi], axis=1, join='outer')

Look at bank_stocks.head():

Close   High    Low     Open    Volume  Close   High    Low     Open    Volume

Symbol  Date    

BAC     2006-01-02  46.15   46.36   45.91   46.02   0.0     NaN     NaN     NaN     NaN     NaN

2006-01-03  47.08   47.18   46.15   46.92   16197900.0  NaN     NaN     NaN     NaN     NaN

2006-01-04  46.58   47.24   46.45   47.00   17427400.0  NaN     NaN     NaN     NaN     NaN

2006-01-05  46.64   46.83   46.32   46.58   14668900.0  NaN     NaN     NaN     NaN     NaN

2006-01-06  46.57   46.91   46.35   46.80   11965700.0  NaN     NaN     NaN     NaN     NaN

And bank_stocks.tail():

Close High Low Open Volume Close High Low Open Volume

Symbol Date

C   2015-12-28  NaN     NaN     NaN     NaN     NaN     52.38   52.57   51.96   52.57   8760674.0

2015-12-29  NaN     NaN     NaN     NaN     NaN     52.98   53.22   52.74   52.76   10153634.0

2015-12-30  NaN     NaN     NaN     NaN     NaN     52.30   52.94   52.25   52.84   8763137.0

2015-12-31  NaN     NaN     NaN     NaN     NaN     51.75   52.39   51.75   52.07   11275231.0

2016-01-01  NaN     NaN     NaN     NaN     NaN     51.75   51.75   51.75   51.75   0.0

(Apologies in advance if the output isn't clear, I hope that the code can ease when reproducing the error).

I understand that the problem relies on Symbol, however, I have tried MultiIndexing and did not work.

Any idea how can I obtain a matrix that concatenates the stock data for both boaml and citi under the same Date, and without showing NaN?


Solution

  • Your level 0 MultiIndex 'Symbol' is causing the issue. Try removing that level and then concat

    citi.index = citi.index.droplevel()
    boaml.index = boaml.index.droplevel()
    
    pd.concat([citi.add_suffix('_citi'), boaml.add_suffix('_boaml')], axis = 1)