Search code examples
pythonstringpandasdataframedata-analysis

How to check entries in a column for patterns and calculate the number of patterns?


I have a DataFrame:

         Name         Price
0        Dictionary     3
1        Book           4
2        Dict En-Ru     2
3        BookforKids    6
4        Dict FR-CHN    1

I need a piece of code that will check the column 'Name' for patterns that I can specify myself and will count the number of founded patterns in another DataFrame.

For instance, check the number of entries in the 'Name' column with the patterns Dict an Book ignoring the case should give this result:

|  Pattern    | Occurencies |
| ----------- | ----------- |
| Dict        | 3           |
| Book        | 2           |

Solution

  • Here's one way using str.extract:

    patterns = ['Dict','Book']
    df.Name.str.extract(rf"({'|'.join(patterns)})", expand=False).value_counts()
    
    Dict    3
    Book    2
    Name: 0, dtype: int64
    

    You can make it case insensitive with the flags argument:

    patterns_lower = '|'.join([s.lower() for s in patterns])
    (df.Name.str.lower().str.extract(rf"({patterns_lower})", expand=False)
            .value_counts())