I am trying extract which words were found in a str.contains()
search as seen in the image below (but using pandas and str.contains
, not VBA). I'm trying to recreate the output in the VBA result column.
Here's what I was using to simply show me if the words were found in each comment:
searchfor = list(terms['term'])
found = [reviews['review_trimmed'].str.contains(x) for x in searchfor]
result = pd.DataFrame(found)
This is great in that I know which comments have the terms I'm looking for, but I don't know which terms it found for each. I would like my answer to utilize str.contains
for consistency.
df = pd.DataFrame({
"review_trimmed": [
"dog and cat",
"Cat chases mouse",
"horrible thing",
"noodle soup",
"chilli",
"pizza is Good"
]
})
searchfor = "yes cat Dog soup good bad horrible".split()
df
review_trimmed
0 dog and cat
1 Cat chases mouse
2 horrible thing
3 noodle soup
4 chilli
5 pizza is Good
pandas.Series.str.findall
)'|'.join
to combine all items searched for into a regex string that searches for any of the items.flag=2
which implies IGNORECASE
df.review_trimmed.str.findall('|'.join(searchfor), 2)
0 [dog, cat]
1 [Cat]
2 [horrible]
3 [soup]
4 []
5 [Good]
Name: review_trimmed, dtype: object
We can join
them with ';'
like so:
df.review_trimmed.str.findall('|'.join(searchfor), 2).str.join(';')
0 dog;cat
1 Cat
2 horrible
3 soup
4
5 Good
Name: review_trimmed, dtype: object