Search code examples
pythonscipystatisticsdata-analysiscdf

Selecting random values from a pdf with scipy


Given the dataframe:

Brick_cp = pd.DataFrame({"CP":Brick_cp})

which corresponds to this distribution:

sns.distplot(Brick_cp, fit = stats.norm)

VISUALIZATION

I then create a normal function based on the values:

loc, scale = stats.norm.fit(Brick_cp.astype(float))

loc, scale = Out[]: (911.1121589743589, 63.42365993765692)

#PROBABILITY DENSITY FUNCTION (PDF)
x = np.linspace (start = 600, stop = 1200, num = 100)
pdf = stats.norm.pdf(x, loc=loc, scale=scale)

PDF

To which corresponds the CDF:

cdf = stats.norm.cdf(x, loc=loc, scale=scale)

CDF

Finally I create the PROBABILITY DENSITY FUNCTION (PDF):

cdf_ = np.linspace(start=0, stop=1, num=10000)
x_ = stats.norm.ppf(cdf_, loc=loc, scale=scale)

PPF

The aim is to generate a predefined number of random values taken from the PDF. To do this I thought of generating random values in the range between 0 and 1 in the PPF and finding the corresponding value on the abscissae. Currently I do this in this way:

v = np.random.uniform(0,1,1000)
f = lambda x1: np.interp(x1, cdf_, x_)
brick_cp_value = f(v)

I would like to ask if there is an easier way of random sampling in scipy and if the method I am using is correct. Unfortunately I am a beginner. Thanks

Edit: I also tried this method:

random_samples = stats.norm.rvs(loc, scale, size=1000)

Solution

  • Sampling from Gaussian is a very common thing, therefore there is a simple way to do this given the mean (loc) and standard variation (scale) of the pdf (e.g. with numpy.random.normal()):

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import scipy.stats as stats
    
    Brick_cp = pd.DataFrame({"CP":Brick_cp})
    sns.distplot(Brick_cp, fit = stats.norm)
    loc, scale = stats.norm.fit(Brick_cp.astype(float))
    random_samples = np.random.normal(loc, scale, size=1000)