Search code examples
pythonstringpandasdataframestring-matching

How to find all rows in a dataframe that contain a substring?


I have one word and a Pandas dataframe with a column of string values. Now I'm trying to find the rows in that dataframe which contain that word in their string part.

I read about extractall() method but I'm not sure how to use it or if its even the right answer.


Solution

  • Using this test data (modified and borrowed from Chris Albon):

    raw_data = {'regiment': ['Nighthawks Goons', 'Nighthawks Goons', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
            'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
            'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
    

    You can use this to find rows that contain the word goons only (I am ignoring the case):

    df[df['regiment'].str.contains(r"\bgoons\b", case = False)]