I have a list of strings that I have to match with dataframe column.
The list looks as follows:
list = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view
wcdma']
The column in the dataframe looks like this:
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
I'd like to find every row which contains each word from the sting from list so that as a result I could have the next dataframe:
COLUMN | String
wcdma street view disconnected | street view wcdma
gbts planned work street view | street view gbts
lte atn golden village optical invalid| golden village lte
wcdma street view planned work | street view wcdma
What did I tried to find matches is to provide string in list as list of elements (like ['street', 'view', 'wcdma']) and do searches:
df.apply(lambda x: all(er in x.COLUMN for er in list), axis=1)
But it returns me nothing, even in case I do know that there must be at least one match. It WILL return smth if I change all() to any() but that's not what I need.
import pandas as pd
list2 = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
list2=[x.split(' ') for x in list1]
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
data=pd.DataFrame(data)
def search(x):
list1=x.split(' ')
for y in list2:
check=all(item in list1 for item in y)
if check:
return ' '.join(y)
return None
data['matched']=data['COLUMN'].transform(search)
Explanation: I am converting each string as list 1st splitting on space. Using transform() for 'COLUMN', I am using all() to detect whether all elements of 'y' are in 'list2'. If yes, I return that string