Search code examples
pythonpandasdataframeconcatenation

When using pd.concat I get extra columns


When I join my two dataframes I encounter a problem with my output. The function pd.concat adds the rows of my second dataframe to my first dataframe even though the columns also appear in the first dataframe.

The dataframes:

df1 = pd.DataFrame([
    ['A1',  'B2',  '2010-12-06', '2014-03-14',  'E1',  'F1',   'G2'],
    ['A2',  'B2',  '2014-04-13', '2017-10-01',  'E2',  'F2',   'G2'],
    ['A3',  'B2',  '2017-11-01',  None       ,  'E3',  'F3',   'G2'],
    ['A10', 'B10', '2010-12-06', '2014-03-14 ', 'E10', 'F10', 'G10'],
    ['A11', 'B10', '2014-04-13', '2017-10-01',  'E11', 'F11', 'G10'],
    ['A12', 'B10', '2018-11-01',  None,         'E11', 'F12', 'G10'],
    ['A14', 'B14', '2010-12-06', '2014-03-14',  'E14', 'F14', 'G14'],
    ['A15', 'B14', '2016-04-13', '2017-10-01',  'E15', 'F15', 'G14']], 
   columns=['Baum1', 'Baum2', 'date1', 'date2', 'Baum5', 'Baum6', 'Baum7']) 

The second one is basically just 4 rows of df1.

df2 = pd.DataFrame([
    ['B2',  'G2', '2010-12-06', '2014-03-14'],
    ['B2',  'G2', '2014-04-13', '2017-10-01'],
    ['B2',  'G2', '2017-11-01',  None       ]], 
   columns=[ 'Baum2','Baum7',date1, date2])

So what I want the final dataframe to look like:

df3 = pd.DataFrame([
    ['A1',  'B2',  '2010-12-06', '2014-03-14',  'E1',  'F1',  'G2'],
    ['A2',  'B2',  '2014-04-13', '2017-10-01',  'E2',  'F2',  'G2'],
    ['A3',  'B2',  '2017-11-01',  None       ,  'E3',  'F3',  'G2']], 
   columns=['Baum1', 'Baum2', 'date1', 'date2', 'Baum5', 'Baum6', 'Baum7']) 

If I use df3 = pd.concat([df1, df2],axis=1, join='inner') it just adds the 4 columns of df1 to df2 but I don't want that. I tried using merge but this error occurs: You are trying to merge on datetime64[ns] and object columns. If you wish to proceed you should use pd.concat.


Solution

  • Use merge:

    >>> df1.merge(df2, how="inner")
    
      Baum1 Baum2       date1       date2 Baum5 Baum6 Baum7
    0    A1    B2  2010-12-06  2014-03-14    E1    F1    G2
    1    A2    B2  2014-04-13  2017-10-01    E2    F2    G2
    2    A3    B2  2017-11-01        None    E3    F3    G2