Search code examples
pythonexcelpandasnumpy

Searching a column for a substring that matches a value from another column


let's say I have a table that looks something like this:

Value Longer_Value Third_Value Result
substring wrong_cell_sorry wrong_match
othertext substring_here match_this1
othertext secondone_here match_this2
secondone substring_there match_this3

I want to fill in the Result column with values of Third_Value column corresponding to values in the Longer_Value column if the value from the first column was found. If more than one was found, I would like to list them with some delimiter.

So the end table would look something like this:

Value Longer_Value Third_Value Result
substring wrong_cell_sorry wrong_match match_this1, match_this3
othertext substring_here match_this1 no_match
othertext secondone_here match_this2 no_match
secondone substring_there match_this3 match_this2

I've been trying various combinations of .loc and str.contains (with regex disabled) but so far without any luck. Any ideas how to approach this?


Solution

  • Does the below resolve your case?

    # Function to find matches and update the Result column
    def find_value(row, df):
        matches = df[df['Longer_Value'].str.contains(row['Value'], regex=False)]['Third_Value'].tolist()
        return ', '.join(matches) if matches else 'no_match'
    
    # Apply the function to each row in the DataFrame
    df['Result'] = df.apply(lambda row: find_value(row, df), axis=1)