So my idea is to apply the str.strip()
function for each cell to identify it as the missing values and drop it, but it still is not recognized as missing values.
df = pd.read_csv('full_name', header=None, na_values=['?'])
df = df.apply(lambda x: x.str.strip() if x.dtype== 'object' else x)
df.dropna(axis=0, inplace=True, how='any')
df.head(20)]
what is an efficient way to solve this?
dropna
drops NaN values. Since your NaNs are actually ?
, you could replace
them with NaN and use dropna
:
df = df.replace('?', np.nan).dropna()
mask
them and use dropna
:
df = df.mask(df.eq('?')).dropna()
or simply filter those rows out and only select rows without any ?
:
df = df[df.ne('?').all(axis=1)]