Given the dataframe:
Brick_cp = pd.DataFrame({"CP":Brick_cp})
which corresponds to this distribution:
sns.distplot(Brick_cp, fit = stats.norm)
I then create a normal function based on the values:
loc, scale = stats.norm.fit(Brick_cp.astype(float))
loc, scale = Out[]: (911.1121589743589, 63.42365993765692)
#PROBABILITY DENSITY FUNCTION (PDF)
x = np.linspace (start = 600, stop = 1200, num = 100)
pdf = stats.norm.pdf(x, loc=loc, scale=scale)
To which corresponds the CDF:
cdf = stats.norm.cdf(x, loc=loc, scale=scale)
Finally I create the PROBABILITY DENSITY FUNCTION (PDF):
cdf_ = np.linspace(start=0, stop=1, num=10000)
x_ = stats.norm.ppf(cdf_, loc=loc, scale=scale)
The aim is to generate a predefined number of random values taken from the PDF. To do this I thought of generating random values in the range between 0 and 1 in the PPF and finding the corresponding value on the abscissae. Currently I do this in this way:
v = np.random.uniform(0,1,1000)
f = lambda x1: np.interp(x1, cdf_, x_)
brick_cp_value = f(v)
I would like to ask if there is an easier way of random sampling in scipy and if the method I am using is correct. Unfortunately I am a beginner. Thanks
Edit: I also tried this method:
random_samples = stats.norm.rvs(loc, scale, size=1000)
Sampling from Gaussian is a very common thing, therefore there is a simple way to do this given the mean (loc
) and standard variation (scale
) of the pdf (e.g. with numpy.random.normal()):
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
Brick_cp = pd.DataFrame({"CP":Brick_cp})
sns.distplot(Brick_cp, fit = stats.norm)
loc, scale = stats.norm.fit(Brick_cp.astype(float))
random_samples = np.random.normal(loc, scale, size=1000)