I would like to make a somewhat complex multi-indexed column pandas dataframe concatenation. This concat consists of two or more dataframes in the following format:
indicator eps2p ... logreturn
stock TIET11 ALSC3 ALPA4 ... TIET11 ALSC3 ALPA4
date
2000-01-31 NaN NaN NaN ... NaN NaN NaN
2000-02-01 100 NaN NaN ... NaN NaN NaN
2000-02-02 101 NaN NaN ... NaN NaN NaN
2000-02-03 104 NaN NaN ... NaN NaN NaN
2000-02-04 NaN NaN NaN ... NaN NaN NaN
Now, imagine that I have two more dataframes just like the one above, but instead of eps2p
and logreturn
, I have ev2ebitda
and logreturn
, and ev2ebit
and logreturn
in the indicator
column level. Also, imagine that not all of the dataframes have the exact same stock
columns under each one of the indicators. Finally, also consider that not necessarily two dataframes with the same stock
under the same indicator
have the exact same index, i.e., I could have two dataframes with the column logreturn/TIET11
, but with complementary data:
DATAFRAME #1
logreturn
TIET11
date
2000-01-31 100
2000-02-01 NaN
2000-02-02 102
DATAFRAME #2
logreturn
TIET11
date
2000-01-31 100
2000-02-01 101
2000-02-02 NaN
With that in mind, how do I concat these dataframes into a single one that has all the unique indicator columns (eps2p
, ev2ebit
and ev2ebitda
) and the common one (logreturn
) in way that I don't have repeated data or lost data from any of the concatenated dataframes?
You can use update
to do this:
In [11]: df1.update(df2)
In [12]: df1
Out[12]:
logreturn
TIET11
date
2000-01-31 100.0
2000-02-01 101.0
2000-02-02 102.0
Since there are missing columns in df1 you can first reindex with the union:
res = df1.reindex(columns=df3.columns | df4.columns)
res.update(df2)
Note: You may need to reindex the index too.