Search code examples
pythonpandasdata-analysis

How to apply a Pandas filter on a data frame based on entries in a column from a different data frame (no join)


As an example, I have one data frame (df_1) with one column which contains some text data. The second data frame (df_2) contains some numbers. How do I check if the text contains the numbers from the second data frame?

df_1

                       Note
0  The code to this is 1003
1  The code to this is 1004

df_2

   Code_Number
0         1006
1         1003

So I want to check if the entries in [Note] from df_1 contains the entries from [Code_Number] from df_2

I have tried using the following code: df_1[df_1['Note'].str.contains(df_2['Code_Number'])] and I know I cannot use a join as I do not have a key to join on.

The final result which I am looking for after the filtering has been applied is:

   Note              
0  The code to this is 1003    

Solution

  • Try this, and see if it covers your use case: get a cross cartesian of both columns, using itertools' product and filter based on the condition:

    from itertools import product
    m = [ left for left, right
          in product(df.Note,df1.Code_Number) 
          if str(right) in left]
    
    pd.DataFrame(m,columns=['Note'])
    
                   Note
    0   The code to this is 1003