Search code examples
pythonpython-3.xpandasisin

Keeps rows that aren't in list


I have a dataframe with sales and offers.

df  offer                       sales
0   £10 off appple               10
1   £10 off apple and samsung    20

I have a list of offers that I want to avoid, which for this example has only 1 offer.

remove_these_offers_list = ["£10 off appple"]

When I try to remove this offer using df.loc[~(df.offer.isin(remove_these_offers_list))] I get an empty df back because the string is technically contained in both rows.

Expected Output

df  offer                        sales
1   £10 off apple and samsung     20

Solution

  • try striping the white space by using str.strip():

    df=df.loc[~(df['offer'].str.strip().isin(remove_these_offers_list))]
    

    OR

    Since the method that you mentioned is working so another way via str.fullmatch():

    df=df.loc[~df['offer'].str.fullmatch('|'.join(remove_these_offers_list))]
    

    output of df:

        df  offer                       sales
    1   1   £10 off apple and samsung   20