Search code examples
pythonpandasdataframenanseries

Python Pandas - Problem with removing nans


I am struggling to remove nans. Already spent some time searching for the solution but nothing seems to work.

Below I am attaching a sample of my code. The whole notebook can be found on my GitHub here: https://github.com/jarsonX/Temp_files/blob/main/W3-Exploratory%20Data%20Analysis(1).ipynb

import pandas as pd     
import seaborn as sns               #not used in this sample, needed for plotting later on
import matplotlib as mpl            #as above
import matplotlib.pyplot as plt     #as above
import numpy as np                  #as above

df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m2_survey_data.csv")

df.Age.describe()  #dtype float64

df['Age'].isna().value_counts()  #287 nans

df['Age'].dropna(how='any', inplace=True)  #trying to remove nans

df['Age'].isna().value_counts()  #still 287 nans

#Just for the sake of identification of rows
#I tried to print ONLY nans but could not figure out how to do it.
i = 0
for el in df.Age:
    print(i, el, type(el))
    i += 1

#The first nan is in the 67th row

What am I missing?

UPDATE:

I've managed to filter out nans:

i = 0
for el in df.Age:
    if el != el:
        print(i, el, type(el))
    i += 1

Solution

  • You can try out the following snippet, dropna when called in a Series doesn't respect the how argument, since its just a single column

    df.dropna(subset=["Age"], how="any", inplace=True)