Search code examples
pandaslambdacontains

Lambda not in function doesn't work for more than one word in Python


I would like to filter dataframe by lambda if condition

I have a "product name" and "category1" columns and "if product name" not contains ("boxer","boxers","sock","socks") words I would like to change "category1" column as "Other", but below code change all of them as "other" example even contains "sock"

df = pd.DataFrame({
'product_name': ["blue shirt", " medium boxers", "red jackets ", "blue sock"],})


df["category1"]=df.apply(lambda x: "Other" if ("boxer","boxers","sock","socks" not in x["product_name"] ) else x["category1"], axis=1)

I expected below results

df = pd.DataFrame({
'product_name': ["blue shirt", " medium boxers", "red jackets ", "blue sock"],
 'category1'["other", Nan, "other ", "Nan"],})

Thank you for your support


Solution

  • You could use str.contains:

    items = ("boxer","boxers","sock","socks")
    
    import numpy as np
    df["category1"] = np.where(df['product_name'].str.contains('|'.join(items)),
                               np.nan,  # value is True
                               'Other') # value if False
    

    output:

         product_name category1
    0      blue shirt     Other
    1   medium boxers       nan
    2    red jackets      Other
    3       blue sock       nan