When I join my two dataframes I encounter a problem with my output. The function pd.concat adds the rows of my second dataframe to my first dataframe even though the columns also appear in the first dataframe.
The dataframes:
df1 = pd.DataFrame([
['A1', 'B2', '2010-12-06', '2014-03-14', 'E1', 'F1', 'G2'],
['A2', 'B2', '2014-04-13', '2017-10-01', 'E2', 'F2', 'G2'],
['A3', 'B2', '2017-11-01', None , 'E3', 'F3', 'G2'],
['A10', 'B10', '2010-12-06', '2014-03-14 ', 'E10', 'F10', 'G10'],
['A11', 'B10', '2014-04-13', '2017-10-01', 'E11', 'F11', 'G10'],
['A12', 'B10', '2018-11-01', None, 'E11', 'F12', 'G10'],
['A14', 'B14', '2010-12-06', '2014-03-14', 'E14', 'F14', 'G14'],
['A15', 'B14', '2016-04-13', '2017-10-01', 'E15', 'F15', 'G14']],
columns=['Baum1', 'Baum2', 'date1', 'date2', 'Baum5', 'Baum6', 'Baum7'])
The second one is basically just 4 rows of df1.
df2 = pd.DataFrame([
['B2', 'G2', '2010-12-06', '2014-03-14'],
['B2', 'G2', '2014-04-13', '2017-10-01'],
['B2', 'G2', '2017-11-01', None ]],
columns=[ 'Baum2','Baum7',date1, date2])
So what I want the final dataframe to look like:
df3 = pd.DataFrame([
['A1', 'B2', '2010-12-06', '2014-03-14', 'E1', 'F1', 'G2'],
['A2', 'B2', '2014-04-13', '2017-10-01', 'E2', 'F2', 'G2'],
['A3', 'B2', '2017-11-01', None , 'E3', 'F3', 'G2']],
columns=['Baum1', 'Baum2', 'date1', 'date2', 'Baum5', 'Baum6', 'Baum7'])
If I use df3 = pd.concat([df1, df2],axis=1, join='inner')
it just adds the 4 columns of df1 to df2 but I don't want that. I tried using merge but this error occurs:
You are trying to merge on datetime64[ns] and object columns. If you wish to proceed you should use pd.concat
.
Use merge
:
>>> df1.merge(df2, how="inner")
Baum1 Baum2 date1 date2 Baum5 Baum6 Baum7
0 A1 B2 2010-12-06 2014-03-14 E1 F1 G2
1 A2 B2 2014-04-13 2017-10-01 E2 F2 G2
2 A3 B2 2017-11-01 None E3 F3 G2