My main string is in dataframe and substrings are stored in lists. My desired output is to find the matched substring. Here is the code I am using.
sentence2 = "Previous study: 03/03/2018 (other hospital) Findings: Lung parenchyma: The study reveals evidence of apicoposterior segmentectomy of LUL showing soft tissue thickening adjacent surgical bed at LUL, possibly post operation."
blob_sentence = TextBlob(sentence2)
noun = blob_sentence.noun_phrases
df1 = pd.DataFrame(noun)
comorbidity_keywords = ["segmentectomy","lobectomy"]
matches =[]
for comorbidity_keywords[0] in df1:
if comorbidity_keywords[0] in df1 and comorbidity_keywords[0] not in matches:
matches.append(comorbidity_keywords)
This gives me the result as the string that is not an actual match. The output should be "segmentectomy". But I get [0,'lobectomy']. Please Help!!. I have tried to take help from the answer posted here. Check if multiple strings exist in another string Please help to find out what am I doing incorrectly?
I don't really use TextBlob, but I have two methods that might help you get to your goal. Essentially, I'm splitting the sentence by a whitespace and iterating through that to see if there are any matches. One method returns a list and the other a dictionary of index values and the word.
### If you just want a list of words
def find_keyword_matches(sentence, keyword_list):
s1 = sentence.split(' ')
return [i for i in s1 if i in keyword_list]
Then:
find_keyword_matches(sentence2, comorbidity_keywords)
Output:
['segmentectomy']
For a dictionary:
def find_keyword_matches(sentence, keyword_list):
s1 = sentence.split(' ')
return {xyz.index(i):i for i in xyz if i in comorbidity_keywords}
Output:
{17: 'segmentectomy'}
Finally, an iterator that will also print where in the sentence a word is found, if at all:
def word_range(sentence, keyword):
try:
idx_start = sentence.index(keyword)
idx_end = idx_start + len(keyword)
print(f'Word \'{keyword}\' found within index range {idx_start} to {idx_end}')
if idx_start > 0:
return keyword
except ValueError:
pass
Then do a nested list comprehension to get rid of None values:
found_words = [x for x in [word_range(sentence2, i) for i in comorbidity_keywords] if not x is None]