Search code examples
python-3.xpandassklearn-pandas

Finding common rows between two dataframes based on a column using pandas


I have two dataframes. I need to extract rows based on common values in column 'a'. However, instead of creating one data frame at the end I want to retain the two data frames.

For example:

###Consider the following input
df1 = pd.DataFrame({'a':[0,1,1,2,3,4], 'b':['q','r','s','t','u','v'],'c':['a','b','c','d','e','f']})

df2 = pd.DataFrame({'a':[1,4,5,6], 'b':['qq','rr','ss','tt'],'c':[1,2,3,4]})

The expected output is:


###df1:
      a.      b.     c
0.    1.      r.     a
1.    1.      s.     c 
2.    4.      v.     f


###df2:
      a.      b.     c
0.    1.      qq     1
1.    4.      rr     2


How can I achieve the following result? Insights will be appreciated.


Solution

  • You can generalize it with numpy's intersect1d

    import numpy as np
    
    intersection_arr = np.intersect1d(df1['a'], df2['a'])
    
    df1 = df1.loc[df1['a'].isin(intersection_arr),:]
    df2 = df2.loc[df2['a'].isin(intersection_arr),:]
    

    More than two dataframes:

    import numpy as np
    from functools import reduce
    
    intersection_arr = reduce(np.intersect1d, (df1['a'], df2['a'], df3['a']))
    
    df1 = df1.loc[df1['a'].isin(intersection_arr),:]
    df2 = df2.loc[df2['a'].isin(intersection_arr),:]
    df3 = df3.loc[df3['a'].isin(intersection_arr),:]