I would like to merge two dataframes which have only one column say df1
and df2
as below. Expected output dataframe as df3
.
import pandas as pd
data1 = [
'A',
'B',
'C']
df1 = pd.DataFrame(data1,columns=['name'])
data2 = [
'B',
'C',
'D']
df2 = pd.DataFrame(data2,columns=['name'])
data3 = [
['A',],
['B','B'],
['C','C'],
[None,'D']]
df3 = pd.DataFrame(data3,columns=['name_x','name_y'])
print(df1)
print(df2)
print(df3)
Output:
name
0 A
1 B
2 C
name
0 B
1 C
2 D
name_x name_y
0 A None
1 B B
2 C C
3 None D
Should I use merge
to do it or any other way?
Quick hack, you could merge
passing one of the keys as Series:
df1.merge(df2, left_on='name', right_on=df2['name'], how='outer').drop(columns='name')
Output:
name_x name_y
0 A NaN
1 B B
2 C C
3 NaN D
If you don't drop
you'll also get a column with the merged values:
df1.merge(df2, left_on='name', right_on=df2['name'], how='outer')
name name_x name_y
0 A A NaN
1 B B B
2 C C C
3 D NaN D
If you had extra columns in the input, this would give you:
df1.merge(df2, left_on='name', right_on=df2['name'], how='outer')
name name_x col1 name_y col2
0 A A x NaN NaN
1 B B x B x
2 C C x C x
3 D NaN NaN D x
If you have more than two inputs to combine, you can generalize with concat
:
dfs = [df1, df2, df1, df2]
out = pd.concat(
[
d.set_axis(d['name']).add_suffix(f'_{i}')
for i, d in enumerate(dfs, start=1)
],
axis=1,
).reset_index(drop=True)
Output:
name_1 name_2 name_3 name_4
0 A NaN A NaN
1 B B B B
2 C C C C
3 NaN D NaN D