Search code examples
pythonpandasdelete-row

Pandas: Delete Rows in a dataframe if specific columns don't contain specific text


I have a df

     id  column_int column_int  column_A column_B column_C column_D
 0   1        int       int         ABC     ABC     Keep      na
 1   2        int       int         ABC     ABC     ABC       ABC
 2   3        int       int         ABC     Save    na        na
 3   4        int       int         ABC     Keep    na        na
 4   5        int       imt         ABC     ABC     ABC       ABC
 .
 . 

Where column_int are columns that contain ints and column A-D contain text values. I want to keep only the rows that have either Keep or Save as row values

Before:

 id  column_int column_int  column_A column_B column_C column_D
 0   1        int       int         ABC     ABC     Keep      na
 1   2        int       int         ABC     ABC     ABC       ABC
 2   3        int       int         ABC     Save    na        na
 3   4        int       int         ABC     Keep    na        na
 4   5        int       imt         ABC     ABC     ABC       ABC

After:

 id  column_int column_int  column_A column_B column_C column_D
 0   1        int       int         ABC     ABC     Keep      na
 2   3        int       int         ABC     Save    na        na
 3   4        int       int         ABC     Keep    na        na

I tried the following

for column in df:
    if type(column) == object:
        df = df[df[column].str.contains('Save')] | df[df[column].str.contains('Keep')]
    else:
        pass

Solution

  • You can use use .apply() on the selected columns, then for each column check for Save or Keep by str.contains. Then, use .any() on axis=1 (for row-wise operation) to check if the row contains such strings.

    Finally, filter by .loc, as follows:

    cols = ['column_A',  'column_B',  'column_C',  'column_D']
    
    df.loc[df[cols].apply(lambda x: x.str.contains(r'Save|Keep')).any(axis=1)]
    

    Result:

       id column_int column_int.1 column_A column_B column_C column_D
    0   1        int          int      ABC      ABC     Keep       na
    2   3        int          int      ABC     Save       na       na
    3   4        int          int      ABC     Keep       na       na