Search code examples
pythonpandascontains

See which words from list are in each item using str.contians


I am trying extract which words were found in a str.contains() search as seen in the image below (but using pandas and str.contains, not VBA). I'm trying to recreate the output in the VBA result column.

enter image description here

Here's what I was using to simply show me if the words were found in each comment:

searchfor = list(terms['term'])
found = [reviews['review_trimmed'].str.contains(x) for x in searchfor]
result = pd.DataFrame(found)

This is great in that I know which comments have the terms I'm looking for, but I don't know which terms it found for each. I would like my answer to utilize str.contains for consistency.


Solution

  • Using Grzegorz Skibinski's Setup

    df = pd.DataFrame({
        "review_trimmed": [
            "dog and cat",
            "Cat chases mouse",
            "horrible thing",
            "noodle soup",
            "chilli",
            "pizza is Good"
        ]
    })
    
    searchfor = "yes cat Dog soup good bad horrible".split()
    
    df
    
         review_trimmed
    0       dog and cat
    1  Cat chases mouse
    2    horrible thing
    3       noodle soup
    4            chilli
    5     pizza is Good
    

    _______________________________________________________

    Solution (pandas.Series.str.findall)

    • Use '|'.join to combine all items searched for into a regex string that searches for any of the items.
    • Use flag=2 which implies IGNORECASE

    df.review_trimmed.str.findall('|'.join(searchfor), 2)
    
    0    [dog, cat]
    1         [Cat]
    2    [horrible]
    3        [soup]
    4            []
    5        [Good]
    Name: review_trimmed, dtype: object
    

    We can join them with ';' like so:

    df.review_trimmed.str.findall('|'.join(searchfor), 2).str.join(';')
    
    0     dog;cat
    1         Cat
    2    horrible
    3        soup
    4            
    5        Good
    Name: review_trimmed, dtype: object