Search code examples
pythonpandasdataframeappendconcatenation

How to concat multiple dataframes dynamically, with different columns?


I have one main DF, with all the columns. Often I need to concat a new DF, which sometimes has the same number of columns, but sometimes has less. The idea is to have the following logic:

  • standard DF1 already exists, we should append new data to it
  • if headers are the same --> concat in a standard way
  • if new DF2 doesn't have col X --> create new col X in DF2 with NaN values, then append it to main DF1
  • if new DF3 has new column Y, that doesn't exist yet in DF1, then create this column for old DF1 and add NaN values to it

Short version: I would like to concat multiple dataframes, and in case that there is a new column, expand existing DF with NaN values for that column.

Example of 3 different DFs:

separated DFs

Example of wanted outcome:

Concatenated DFs

My question is: is it possible to do that dynamically, without specifying columns every time.


Solution

  • pd.concat will cope with that without any adjustment, consider following simple example

    import pandas as pd
    df1 = pd.DataFrame({"City":["Berlin","Paris"],"Country":["DE","FR"],"Phone":[111,122]})
    df2 = pd.DataFrame({"City":["Amstera","Copenhag"],"Country":["NL","DK"]})
    df3 = pd.DataFrame({"City":["Vienna","Madrid"],"Country":["AT","ES"],"Phone":[222,343],"Street":["Leoplstr","Avenia"]})
    df = pd.concat([df1,df2,df3])
    print(df)
    

    gives output

           City Country  Phone    Street
    0    Berlin      DE  111.0       NaN
    1     Paris      FR  122.0       NaN
    0   Amstera      NL    NaN       NaN
    1  Copenhag      DK    NaN       NaN
    0    Vienna      AT  222.0  Leoplstr
    1    Madrid      ES  343.0    Avenia
    

    Note: I altered slightly sample data for brevity sake.

    (tested in pandas 1.5.2)