Search code examples
pythonpandasdataframedata-cleaning

Pandas deleting rows based on same sting in columns


Manufacturer               Buy Box Seller
0   Goli                   Goli Nutrition Inc.
1   Hanes                  3rd Street Brands
2   NaN                    Inspiring Life
3   Sports Research        Sports Research
4   Beckham Luxury Linen   Thalestris Co.

Hello i am using pandas DataFrame to clean this file and want to delete rows which contains the manufacturers name in the buy-box seller column. For example row 1 will be deleted because it contains the string 'Goli' in Buy-Box seller Column.


Solution

  • There are misisng values so first replace them by DataFrame.fillna and then test if match values between columns by not in statement in DataFrame.apply with axis=1 and filter in boolean indexing:

    mask = (df.fillna('Missing vals')
              .apply(lambda x: x['Manufacturer'] not in x['Buy Box Seller'], axis=1))
    df = df[mask]