I have to add a weight column in the titanic dataset to calculate adult passengers' weight using a normal distribution with std = 20 and mean = 70 kg. I have tried this code:
df['Weight'] = np.random.normal(20, 70, size=891)
df['Weight'].fillna(df['Weight'].iloc[0], inplace=True)
but I am concerned about two things:
Edit:
This code worked for me
Weight = np.random.normal(80, 20, 718)
adults['Weight'] = Weight
Now I have to calculate probability for people weighted less than 70 and who is between 70 and 100.
I have tried the following code but it raise an error: TypeError: unsupported operand type(s) for -: 'str' and 'int'.
import pandas as pd
import numpy as np
import scipy.stats
adults = df[(df['Age'] >= 20) & (df['Age'] <= 70)]
Weight = np.random.normal(80, 20, 718)
adults['Weight'] = Weight
p1 = adults['Weight'] < 70
p2 = adults[(adults['Weight'] > 70) & (adults['Weight'] < 100)]
scipy.stats.norm.pdf(p1)
scipy.stats.norm.pdf(p2)
Range of a Normal distribution is not restricted. It spans all across real numbers. If you want to restrict it, you should do it manually or use other distributions.
df['Weight'] = np.random.normal(20, 70, size=891)
df.loc[df['Weight'] < min_value, 'Weight'] = min_value
df.loc[df['Weight'] > max_value, 'Weight'] = max_value
Since weights of children and adults are not iid's you should sample it from different distributions
# use different distributions
df.loc[df['person_type'] == 'child', 'Weight'] = np.random.normal(x1, y1, size=children_size)
df.loc[df['person_type'] == 'adult', 'Weight'] = np.random.normal(x2, y2, size=adult_size)