Search code examples
pythonpandasdataframemergeconcatenation

How to merge 2 Python dataframes with different columns but possible same datetime index avoiding index duplicates?


I have 2 dataframes that with different column names and indexed by datetime:

df1

            A  B
time            
2011-03-01  1  6
2011-03-02  4  8
2011-03-08  5  2
2011-03-09  6  3

df2

            C  D
time            
2011-03-01  8  7
2011-03-02  9  6
2011-03-07  4  4
2011-03-08  1  2

I want to merge them to obtain something like this:

            A  B  C  D
time                          
2011-03-01  1  6  8  7
2011-03-02  4  8  9  3
2011-03-07  NaN  NaN  4  4
2011-03-08  5  2  1  2
2011-03-09  6  3  NaN  NaN

instead using the concat command:

df = pd.concat([df1, df2], axis=0).sort_index()

I get the following merged dataframe:

              A    B    C    D
time                          
2011-03-01  1.0  6.0  NaN  NaN
2011-03-01  NaN  NaN  8.0  7.0
2011-03-02  4.0  8.0  NaN  NaN
2011-03-02  NaN  NaN  9.0  6.0
2011-03-07  NaN  NaN  4.0  4.0
2011-03-08  5.0  2.0  NaN  NaN
2011-03-08  NaN  NaN  1.0  2.0
2011-03-09  6.0  3.0  NaN  NaN

with undesired duplicated indexes!

How can I merge the 2 DFs properly?


Solution

  • pandas.concat is the way to go, using axis=1.

    If you still have issues with axis=1, then this means your indexes don't align (maybe a different type) and you will have the same issue with join or merge.

    df1 = pd.DataFrame({'A': [1,4,5,6], 'B': [6,8,2,3]},
                       index=['2011-03-01', '2011-03-02', '2011-03-08', '2011-03-09'])
    
    df2 = pd.DataFrame({'C': [8,9,1,4], 'D': [7,6,2,4]},
                       index=['2011-03-01', '2011-03-02', '2011-03-07', '2011-03-08'])
    
    pd.concat([df1, df2], axis=1).sort_index()
    

    output:

                  A    B    C    D
    2011-03-01  1.0  6.0  8.0  7.0
    2011-03-02  4.0  8.0  9.0  6.0
    2011-03-07  NaN  NaN  1.0  2.0
    2011-03-08  5.0  2.0  4.0  4.0
    2011-03-09  6.0  3.0  NaN  NaN