let's say I have a table that looks something like this:
Value | Longer_Value | Third_Value | Result |
---|---|---|---|
substring | wrong_cell_sorry | wrong_match | |
othertext | substring_here | match_this1 | |
othertext | secondone_here | match_this2 | |
secondone | substring_there | match_this3 |
I want to fill in the Result column with values of Third_Value column corresponding to values in the Longer_Value column if the value from the first column was found. If more than one was found, I would like to list them with some delimiter.
So the end table would look something like this:
Value | Longer_Value | Third_Value | Result |
---|---|---|---|
substring | wrong_cell_sorry | wrong_match | match_this1, match_this3 |
othertext | substring_here | match_this1 | no_match |
othertext | secondone_here | match_this2 | no_match |
secondone | substring_there | match_this3 | match_this2 |
I've been trying various combinations of .loc and str.contains (with regex disabled) but so far without any luck. Any ideas how to approach this?
Does the below resolve your case?
# Function to find matches and update the Result column
def find_value(row, df):
matches = df[df['Longer_Value'].str.contains(row['Value'], regex=False)]['Third_Value'].tolist()
return ', '.join(matches) if matches else 'no_match'
# Apply the function to each row in the DataFrame
df['Result'] = df.apply(lambda row: find_value(row, df), axis=1)