I’ve got some strange values in my date column of my dataset. I’m trying to change these unexpected values into NaN.
I don’t know what these unexpected values will be, hence why I made df 2 - where I’m searching for months (e.g. Dec, March) and then removing these and then seeing what I’ve got left. So now I know that the weird data is in row 1 and 3. But how do I now change the Birthday column value for row 1 and row 3 to say NaN?
My real dataset is much bigger so it’s a bit awkward to just type in the row numbers manually.
#Creating the example df
import pandas as pd
data = {'Age': [20, 21, 19, 18],
'Name': ['Tom', 'nick', 'krish', 'jack'],
'Birthday': ["Dec-82", "heidgo", "Mar-84", "ishosdg"]}
df = pd.DataFrame(data)
#Finding out which rows have the weird values
df2 = df[~df["Birthday"].str.contains("Dec|Mar")]
Locate records that fit the condition to fill their Birthday
column with NaN
:
df.loc[~df["Birthday"].str.contains("Dec|Mar"), 'Birthday'] = np.nan
Age Name Birthday
0 20 Tom Dec-82
1 21 nick NaN
2 19 krish Mar-84
3 18 jack NaN