Search code examples
pythonpandasdata-analysismissing-data

How to fill the missing gender data using the ratio in python?


I want to fill the missing data of gender in proportion in a data set.

i use boolean index and head or tail function to select the top data i want, but when i use fillna function, it doesn't work.but after i try, it only run without boolean index, how can i get the top 3 empty values in example and fill it with 0.

a = pd.DataFrame(np.random.randn(50).reshape((10,5)))
a[0][1,3,4,6,9] = np.nan
a[0][a[0].isnull()].head(3).fillna(value = '0', inplace = True)

the dataframe didn't fill the NaN


Solution

  • Starting with data:

    a = pd.DataFrame(np.random.randn(50).reshape((10,5)))
    a[0][1,3,4,6,9] = np.nan
    

    gives

              0         1         2         3         4
    0 -0.388759 -0.660923  0.385984  0.933920  0.164083
    1       NaN -0.996237 -0.384492  0.191026 -1.168100
    2 -0.773971  0.453441 -0.543590  0.768267 -1.127085
    3       NaN -1.051186 -2.251681 -0.575438  1.642082
    4       NaN  0.123432  1.063412 -1.556765  0.839855
    5 -1.678960 -1.617817 -1.344757 -1.469698  0.276604
    6       NaN -0.813213 -0.077575 -0.064179  1.960611
    7  1.256771 -0.541197 -1.577126 -1.723853  0.028666
    8  0.236197  0.868503 -1.304098 -1.578005 -0.632721
    9       NaN -0.227659 -0.857427  0.010257 -1.884986
    

    Now you want to work on column zero so we use fillna with a limit of 3 and replace that column inplace

    a[0].fillna(0, inplace=True, limit=3)
    

    gives

              0         1         2         3         4
    0 -0.388759 -0.660923  0.385984  0.933920  0.164083
    1  0.000000 -0.996237 -0.384492  0.191026 -1.168100
    2 -0.773971  0.453441 -0.543590  0.768267 -1.127085
    3  0.000000 -1.051186 -2.251681 -0.575438  1.642082
    4  0.000000  0.123432  1.063412 -1.556765  0.839855
    5 -1.678960 -1.617817 -1.344757 -1.469698  0.276604
    6       NaN -0.813213 -0.077575 -0.064179  1.960611
    7  1.256771 -0.541197 -1.577126 -1.723853  0.028666
    8  0.236197  0.868503 -1.304098 -1.578005 -0.632721
    9       NaN -0.227659 -0.857427  0.010257 -1.884986