Search code examples
pythonregexconditional-statementsstring-search

How to identify the string where it contains multiple words


Data frame column text with datatype string contains sentences, I am looking to extract the rows which contains certain words irrespective of place in which they occur.

For ex:

Column
Cat and mouse are the born enemies
Cat is a furry pet


df = df[df['cleantext'].str.contains('cat' & 'mouse')].reset_index()
df.shape

The above is throwing an error.

I know that for or condition we can write -

df = df[df['cleantext'].str.contains('cat | mouse')].reset_index()

But I want to extract the rows where both cat and mouse are present

Expected Output -

Column
Cat and mouse are the born enemies

Solution

  • Here's one approach, which also works for multiple words:

    words = ['cat', 'mouse']
    m = pd.concat([df.Column.str.lower().str.contains(w) for w in words], axis=1).all(1)
    df.loc[m,:]
    
          Column
    0  Cat and mouse are the born enemies