python pandas numpy random list-comprehension

Loop function for missing data

I want to change NaN values with np.random.normal(mu,s,n) function with the list comprehension method, but I couldn't.

df_column_values = ["NaN","1","NaN","2","NaN","3","94","4","168","5","NaN"]

n, mu, sigma = 700, 155, 118
array = np.random.normal(mu, sigma, n)
for i in array:
    if i > 0 and i < 400:    
        data['Insulin'].replace(0,(i), inplace=True)

This function works, but the output is same for all NaN values. How can I improve this code?

Raw data from Kaggle

Solution

It looks like you want to replace missing values with normally distributed random values within a range (0, 400). You need to use truncated normal distribution for this.

Then you should create a vector of random variables of the same length as the data you are potentially replacing.

data = pd.DataFrame({'Insulin': ["NaN","1","NaN","2","NaN","3",
"94","4","168","5","NaN"]})

import scipy.stats as stats

lower, upper = 0, 400
mu, sigma = 155, 118
X = stats.truncnorm(
    (lower - mu) / sigma, 
    (upper - mu) / sigma, 
    loc=mu, scale=sigma)

data['Insulin'] = np.where(
     data['Insulin']=="NaN", 
     X.rvs(len(data)),
     data['Insulin'])

data['Insulin'] = np.where(
     data['Insulin'].isna(), 
     X.rvs(len(data)),
     data['Insulin'])

print(data)

       Insulin
0    59.069239
1            1
2   113.143013
3            2
4    63.488282
5            3
6           94
7            4
8          168
9            5
10  109.272469