Search code examples
pythonpandasdataframerowsdelete-row

How to delete rows that contain only certain values?


I have a dataframe like this

    column_name 
0   OnePlus phones never fail to meet my expectatiion.  
1   received earlier than expected for local set.   
2   \n  
3   good    
4   must buy!
5   \t
6     
7   awesome product!  
8     \n    

I want to remove all rows which contain ONLY \n, \t, , \n in them.

Output should be like this:

    column_name 
0   OnePlus phones never fail to meet my expectatiion.  
1   received earlier than expected for local set.   
2   good    
3   must buy!
4   awesome product!

I tried the following method:

  df = df[df.column_name != '\n'].reset_index(drop=True)
  df = df[df.column_name != ''].reset_index(drop=True)
  df = df[df.column_name != ' '].reset_index(drop=True)
  df = df[df.column_name != '   '].reset_index(drop=True)
  df = df[df.column_name != ' \n '].reset_index(drop=True)

But is there a more elegant way or a pythonic way to do this instead of repeating the code?


Solution

  • You can use Series.str.strip and compare only empty strings:

    df1 = df[df.column_name.str.strip() != ''].reset_index(drop=True)
    

    Or convert empty values to boolean:

    df1 = df[df.column_name.str.strip().astype(bool)].reset_index(drop=True)
    

    Or filter words, for me was necessary strip (maybe in real data strip should be removed):

    df1 = df[df.column_name.str.strip().str.contains('\W', na=False)].reset_index(drop=True)
    

    If need remove missing values and no string values replace these values to NaNs and then use DataFrame.dropna:

    df.column_name = df.column_name.replace(r'^\s*$', np.nan, regex=True)
    df1 = df.dropna(subset=['column_name']).reset_index(drop=True)