Search code examples
pythonpython-3.xpandasdataframesubstring

Searching a dataframe to see if a substring exists in another dataframe using pandas


I'm having trouble to find a solution to this issue where I have 2 dataframes, df1 and df2.

df1
    C1      C2
0   apple   5
1   grape   11
2   orange  7
3   cherry  9

df2
    Item
0   apple soda
1   cherry cola
2   grape juice
3   apple candy
4   bubblegum apple
5   orange citrus
6   funky grapefruit
7   sweet banana

How do I search the strings in df2 and check if a substring exists in df1 which then returns the value from df1 into a new column in df2?

Here's the result I'm trying to achieve:

df2
    Item                C2
0   apple soda          5
1   cherry cola         9
2   grape juice         11
3   apple candy         5
4   bubblegum apple     5     
5   orange citrus       7  
6   funky grapefruit    11     
7   sweet banana        Null 

Solution

  • Here is solution:

    import pandas as pd
    data1 = {'C1': ['apple', 'grape', 'orange', 'cherry'],
             'C2': [5, 11, 7, 9]}
    
    data2 = {'Item': ['apple soda', 'cherry cola', 'grape juice', 'apple candy',
                      'bubblegum apple', 'orange citrus', 'funky grapefruit', 'sweet banana']}
    
    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2)
    
    def find_matching_value(row):
        matching_rows = df1[df1['C1'].apply(lambda x: x in row['Item'])]
        if not matching_rows.empty:
            return matching_rows['C2'].iloc[0]
        return None
    
    df2['C2'] = df2.apply(find_matching_value, axis=1)
    
    print(df2)
    

    Output:

              Item    C2
    0        apple soda   5.0
    1       cherry cola   9.0
    2       grape juice  11.0
    3       apple candy   5.0
    4   bubblegum apple   5.0
    5     orange citrus   7.0
    6  funky grapefruit  11.0
    7      sweet banana   NaN