Search code examples
pythonpandasstringio

Used dropna(subset) but an error occurred


I'm practicing on data preprocessing using dropna method

I simply defined csv_data as

csv_data = \
'''A, B, C, D
1.0, 2.0, 3.0, 4.0
5.0, 6.0,, 8.0
10.0, 11.0, 12.0,'''

df = pd.read_csv(StringIO(csv_data))

And I tried df.dropna(subset=['C']) for dropping rows where NaN appear in 'C' column

But I got an error below.

df.dropna(subset=['C'])
Traceback (most recent call last):

Input In [50] in <cell line: 1>
df.dropna(subset=['C'])

File C:\Anaconda3\lib\site-packages\pandas\util\_decorators.py:311 in wrapper
return func(*args, **kwargs)

File C:\Anaconda3\lib\site-packages\pandas\core\frame.py:6002 in dropna
raise KeyError(np.array(subset)[check].tolist())

KeyError: ['C']

Anyone experienced this error?


Solution

  • Seems like your columns name contains whitespace which needs to be striped before performing dropna. So if you check your current column names you could see this,

    >>> df.columns
    Index(['A', ' B', ' C', ' D'], dtype='object')
                      ^^^
    

    So one approach is to remove the spaces from column names.

    df.columns = df.columns.str.strip()
    

    Alternatively you can pass the exact column name(including spaces)

    df.dropna(subset=[' C'])
                      ^^^^