Search code examples
pythonconcatenation

concatenate 2 dataframes in order to append some columns


I have 2 dataframes(df1 and df2) and I want to append them as follows :

  • df1 and df2 have some columns in common but I want to append the columns that exist in df2 and not in df1 but keep the columns of df1 as they are
  • df2 is empty (all rows are nan)

I could just add columns in df1 but in the future, df2 could have new cols added that is why I do not want to hardcode the column names but rather be done automatically. I used to use append but I get the following message

df_new = df1.append(df2)

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead

I tried the following

df_new = pd.concat([df1, df2], axis=1)

but it concatenates all the columns of both dataframes


Solution

  • You could use pd.concat() with axis=0 (default) and join='outer' (default). I'm illustrating with some examples

    df1 = pd.DataFrame({'col1': [3,3,3],
                        'col2': [4,4,4]})
    df2 = pd.DataFrame({'col1': [1,2,3],
                        'col2': [1,2,3],
                        'col3': [1,2,3],
                        'col4': [1,2,3]})
    print(df1)
       col1  col2
    0     3     4
    1     3     4
    2     3     4
    
    print(df2)
       col1  col2  col3  col4
    0     1     1     1     1
    1     2     2     2     2
    2     3     3     3     3
    
    df3 = pd.concat([df1, df2], axis=0, join='outer')
    
    print(df3)
       col1  col2  col3  col4
    0     3     4   NaN   NaN
    1     3     4   NaN   NaN
    2     3     4   NaN   NaN
    0     1     1   1.0   1.0
    1     2     2   2.0   2.0
    2     3     3   3.0   3.0