I am a new Data Scientist, and I am trying to write a code that will calculate the percentage of missing values per each column in a data frame.
Here is a reproducible code:
my_df = pd.DataFrame([[None, 2, 3],
[4, None, 6],
[7, 8, None]])
In this code, each column contains 33.3% of missing values. The code that I developed to try to solve my own problem is as follows:
my_df.isnull().sum() / my_df.count()
This code outputs that there are 0.5 for percentage of missing values per column, because as I learned by developing this code the function count() does not consider missing values and counts only non-null values.
How can I overcome this problem and get the correct answer to this problem that states that there the % of missing values per each column is 0.33, and not 0.5?
Thank you!
give this a try:
my_df.isnull().sum()/len(my_df)