python pandas dataframe merge multi-index

How to concatenate multi-indexed column dataframes

I would like to make a somewhat complex multi-indexed column pandas dataframe concatenation. This concat consists of two or more dataframes in the following format:

indicator     eps2p                         ... logreturn                      
stock         TIET11     ALSC3     ALPA4    ...   TIET11  ALSC3 ALPA4    

date
2000-01-31       NaN       NaN       NaN    ...     NaN   NaN   NaN  
2000-02-01       100       NaN       NaN    ...     NaN   NaN   NaN  
2000-02-02       101       NaN       NaN    ...     NaN   NaN   NaN  
2000-02-03       104       NaN       NaN    ...     NaN   NaN   NaN 
2000-02-04       NaN       NaN       NaN    ...     NaN   NaN   NaN

Now, imagine that I have two more dataframes just like the one above, but instead of eps2p and logreturn, I have ev2ebitda and logreturn, and ev2ebit and logreturn in the indicator column level. Also, imagine that not all of the dataframes have the exact same stock columns under each one of the indicators. Finally, also consider that not necessarily two dataframes with the same stock under the same indicator have the exact same index, i.e., I could have two dataframes with the column logreturn/TIET11, but with complementary data:

DATAFRAME #1                   
              logreturn     
              TIET11        
date                        
2000-01-31    100       
2000-02-01    NaN       
2000-02-02    102

DATAFRAME #2                   
              logreturn     
              TIET11        
date                        
2000-01-31    100       
2000-02-01    101       
2000-02-02    NaN

With that in mind, how do I concat these dataframes into a single one that has all the unique indicator columns (eps2p, ev2ebit and ev2ebitda) and the common one (logreturn) in way that I don't have repeated data or lost data from any of the concatenated dataframes?

Solution

You can use update to do this:

In [11]: df1.update(df2)

In [12]: df1
Out[12]:
           logreturn
              TIET11
date
2000-01-31     100.0
2000-02-01     101.0
2000-02-02     102.0

Since there are missing columns in df1 you can first reindex with the union:

res = df1.reindex(columns=df3.columns | df4.columns)
res.update(df2)

Note: You may need to reindex the index too.