Search code examples
pythonpandasconditional-statementsrowsdrop

Python drop rows containing ending characters from any column


Is there a way other than specifying each column, i.e. df.drop(df[Col1]..., where rows can be deleted based on a condition?

For example, can I iterate through Col1, Col2, ...through Col15 and delete all rows ending with the letter "A"?

I was able to delete columns using

df.loc[:,~ df.columns.str.startswith('A')]

Solution

  • IIUC, you have a pandas DataFrame and want to drop all rows that contain at least one string that ends with the letter 'A'. One fast way to accomplish this is by creating a mask via numpy:

    import pandas as pd
    import numpy as np
    

    Suppose our df looks like this:

          0     1     2     3     4  5
    0  ADFC  FDGA  HECH  AFAB  BHDH  0
    1  AHBD  BABG  CBCA  AHDF  BCAG  1
    2  HEFH  GEHH  CBEF  DGEC  DGFE  2
    3  HEDE  BBHE  CCCB  DDGB  DCAG  3
    4  BGEC  HACB  ACHH  GEBC  GEEG  4
    5  HFCC  CHCD  FCBC  DEDF  AECB  5
    6  DEFE  AHCH  CHFB  BBAA  BAGC  6
    7  HFEC  DACC  FEDA  CBAG  GEDD  7
    

    Goal: we want to get rid of rows with index 0, 1, 6, 7.

    Try:

    mask = np.char.endswith(df.to_numpy(dtype=str),'A') # create ndarray with booleans
    indices_true = df[mask].index.unique() # Int64Index([0, 1, 6, 7], dtype='int64')
    df.drop(indices_true, inplace=True) # drop indices_true
    print(df)
    
    out:
          0     1     2     3     4  5
    2  HEFH  GEHH  CBEF  DGEC  DGFE  2
    3  HEDE  BBHE  CCCB  DDGB  DCAG  3
    4  BGEC  HACB  ACHH  GEBC  GEEG  4
    5  HFCC  CHCD  FCBC  DEDF  AECB  5