Search code examples
pythonpandasdataframemergemulti-index

How to concatenate multi-indexed column dataframes


I would like to make a somewhat complex multi-indexed column pandas dataframe concatenation. This concat consists of two or more dataframes in the following format:

indicator     eps2p                         ... logreturn                      
stock         TIET11     ALSC3     ALPA4    ...   TIET11  ALSC3 ALPA4    

date
2000-01-31       NaN       NaN       NaN    ...     NaN   NaN   NaN  
2000-02-01       100       NaN       NaN    ...     NaN   NaN   NaN  
2000-02-02       101       NaN       NaN    ...     NaN   NaN   NaN  
2000-02-03       104       NaN       NaN    ...     NaN   NaN   NaN 
2000-02-04       NaN       NaN       NaN    ...     NaN   NaN   NaN 

Now, imagine that I have two more dataframes just like the one above, but instead of eps2p and logreturn, I have ev2ebitda and logreturn, and ev2ebit and logreturn in the indicator column level. Also, imagine that not all of the dataframes have the exact same stock columns under each one of the indicators. Finally, also consider that not necessarily two dataframes with the same stock under the same indicator have the exact same index, i.e., I could have two dataframes with the column logreturn/TIET11, but with complementary data:

DATAFRAME #1                   
              logreturn     
              TIET11        
date                        
2000-01-31    100       
2000-02-01    NaN       
2000-02-02    102       
DATAFRAME #2                   
              logreturn     
              TIET11        
date                        
2000-01-31    100       
2000-02-01    101       
2000-02-02    NaN       

With that in mind, how do I concat these dataframes into a single one that has all the unique indicator columns (eps2p, ev2ebit and ev2ebitda) and the common one (logreturn) in way that I don't have repeated data or lost data from any of the concatenated dataframes?


Solution

  • You can use update to do this:

    In [11]: df1.update(df2)
    
    In [12]: df1
    Out[12]:
               logreturn
                  TIET11
    date
    2000-01-31     100.0
    2000-02-01     101.0
    2000-02-02     102.0
    

    Since there are missing columns in df1 you can first reindex with the union:

    res = df1.reindex(columns=df3.columns | df4.columns)
    res.update(df2)
    

    Note: You may need to reindex the index too.