Search code examples
pythonpandasmissing-data

Get proportion of missing values per Country


enter image description here

I would like to find the proportion of missing values of my features on each country and on all years to select the countries.

I tried this:

df[indicators].isna().mean().sort_values(ascending=False)

but it gives me the proportion of missing values for each indicator only...

i would like this output :

enter image description here


Solution

  • You can use DataFrame.melt for reshape and then aggregate mean of missing values:

    df1 = (df.melt(id_vars='Country Name', value_vars=indicators)
             .set_index('Country Name')['value'].isna()
             .groupby('Country Name')
             .mean()
             .reset_index(name='Prop'))
    

    Or reshape by DataFrame.stack:

    df1 = (df.set_index('Country Name')[indicators]
             .stack(dropna=False)
             .isna()
             .groupby('Country Name')
             .mean()
             .reset_index(name='Prop')
            )
    

    Or use custom function:

    df1 = (df.groupby('Country Name')[indicators]
             .apply(lambda x: np.mean(x.isna().to_numpy(), axis=None))
             .reset_index(name='Prop'))