Search code examples
pythonnandata-cleaning

How to change values in specific rows/columns to NaN based on condition?


I’ve got some strange values in my date column of my dataset. I’m trying to change these unexpected values into NaN.

I don’t know what these unexpected values will be, hence why I made df 2 - where I’m searching for months (e.g. Dec, March) and then removing these and then seeing what I’ve got left. So now I know that the weird data is in row 1 and 3. But how do I now change the Birthday column value for row 1 and row 3 to say NaN?

My real dataset is much bigger so it’s a bit awkward to just type in the row numbers manually.

#Creating the example df
import pandas as pd
data = {'Age': [20, 21, 19, 18],
        'Name': ['Tom', 'nick', 'krish', 'jack'],
       'Birthday': ["Dec-82", "heidgo", "Mar-84", "ishosdg"]}
df = pd.DataFrame(data)


#Finding out which rows have the weird values 
df2 = df[~df["Birthday"].str.contains("Dec|Mar")]

Solution

  • Locate records that fit the condition to fill their Birthday column with NaN:

    df.loc[~df["Birthday"].str.contains("Dec|Mar"), 'Birthday'] = np.nan
    

       Age   Name Birthday
    0   20    Tom   Dec-82
    1   21   nick      NaN
    2   19  krish   Mar-84
    3   18   jack      NaN