Search code examples
pythonpandasmerge

Why `pd.merge()` still works even though the `on` column is located in the index


Consider the following example:

import pandas as pd

df1 = pd.DataFrame(
   data={'A': range(5)}, 
   index=range(10, 15),
).rename_axis(index='ID')
df2 = df1.add(100).reset_index()

print(df1)
#     A
# ID   
# 10  0
# 11  1
# 12  2
# 13  3
# 14  4

print(df2)
#    ID    A
# 0  10  100
# 1  11  101
# 2  12  102
# 3  13  103
# 4  14  104

Here, we have two dataframes:

  • df1: Includes the ID as index
  • df2: Includes the ID as a column

To my surprise, pd.merge() still works:

result = df1.merge(df2, on='ID', left_index=False)

print(result)
#    ID  A_x  A_y
# 0  10    0  100
# 1  11    1  101
# 2  12    2  102
# 3  13    3  103
# 4  14    4  104

You can also leave out left_index=False as its the default. It still works.

However, the on='ID' column does not exist in in df1 and it should raise an error.

Am I missing something here?


Solution

  • From the documentation for df.merge() for the on parameter:

    Column or index level names to join on. These must be found in both DataFrames

    (emphasis mine)

    So as the index in your example is also called ID, it still works