Consider the following example:
import pandas as pd
df1 = pd.DataFrame(
data={'A': range(5)},
index=range(10, 15),
).rename_axis(index='ID')
df2 = df1.add(100).reset_index()
print(df1)
# A
# ID
# 10 0
# 11 1
# 12 2
# 13 3
# 14 4
print(df2)
# ID A
# 0 10 100
# 1 11 101
# 2 12 102
# 3 13 103
# 4 14 104
Here, we have two dataframes:
df1
: Includes the ID
as indexdf2
: Includes the ID
as a columnTo my surprise, pd.merge()
still works:
result = df1.merge(df2, on='ID', left_index=False)
print(result)
# ID A_x A_y
# 0 10 0 100
# 1 11 1 101
# 2 12 2 102
# 3 13 3 103
# 4 14 4 104
You can also leave out left_index=False
as its the default. It still works.
However, the on='ID'
column does not exist in in df1
and it should raise an error.
Am I missing something here?
From the documentation for df.merge() for the on
parameter:
Column or index level names to join on. These must be found in both DataFrames
(emphasis mine)
So as the index in your example is also called ID, it still works