Pandas: find matching rows in two dataframes (without using `merge`)

Let's suppose I have these two dataframes with the same number of columns, but possibly different number of rows:

tmp = np.arange(0,12).reshape((4,3))
df = pd.DataFrame(data=tmp) 

tmp2 = {'a':[3,100,101], 'b':[4,4,100], 'c':[5,100,3]}
df2 = pd.DataFrame(data=tmp2)

print(df)
   0   1   2
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11

print(df2)
     a    b    c
0    3    4    5
1  100    4  100
2  101  100    3

I want to verify if the rows of df2 are matching any rows of df, that is I want to obtain a series (or an array) of boolean values that gives this result:

0     True
1    False
2    False
dtype: bool

I think something like the isin method should work, but I got this result, which results in a dataframe and is wrong:

print(df2.isin(df))
       a      b      c
0  False  False  False
1  False  False  False
2  False  False  False

As a constraint, I wish to not use the merge method, since what I am doing is in fact a check on the data before applying merge itself. Thank you for your help!

Solution

You can use numpy.isin, which will compare all elements in your arrays and return True or False for each element for each array.

Then using all() on each array, will get your desired output as the function returns True if all elements are true:

>>> pd.Series([m.all() for m in np.isin(df2.values,df.values)])

0     True
1    False
2    False
dtype: bool

Breakdown of what is happening:

# np.isin
>>> np.isin(df2.values,df.values)

Out[139]: 
array([[ True,  True,  True],
       [False,  True, False],
       [False, False,  True]])

# all()
>>> [m.all() for m in np.isin(df2.values,df.values)]

Out[140]: [True, False, False]

# pd.Series()
>>> pd.Series([m.all() for m in np.isin(df2.values,df.values)])

Out[141]: 
0     True
1    False
2    False
dtype: bool