Search code examples
pythonpandascontains

Filter columns contains all substring


i am trying to select all crispy chicken sandwich in datasets, i have tried using this regex but it still got some grilled chicken sandwich. Here is the code

data_sandwich_crispy = data[data['Item'].str.contains(r'^(?=.*crispy)(?=.*sandwich)(?=.*chicken)', regex=True)]

and here is the look of datasets

any revision, or link to answer is really appreciated. i'm really sorry if there was a mistake, thanks you for all your help!


Solution

  • If you meant collecting all rows containing crispy chicken sandwhich only, then have a look at this alternative solution below. This will return rows only when all three words (crispy, chicken and classic) are present :

    data_sandwich_crispy = df[df['item'].str.contains(r'^(?=.*?\bcrispy\b)(?=.*?\bchicken\b)(?=.*?\bclassic\b).*$',regex=True)]
    

    I created a simple dataframe as shown below:

    item    id
    premium crispy chicken classic sandwhich    10
    premium grilled chicken classic sandwhich   15
    premium club chicken classic sandwhich      14
    

    running the command given above gives the following output:

    item    id
    premium crispy chicken classic sandwhich    10