What I am trying to do is to play around with some random distribution. I don't want it to be normal. But for the time being normal is easier.
import matplotlib.pyplot as plt
from scipy.stats import norm
ws=norm.rvs(4.0, 1.5, size=100)
density, bins = np.histogram(ws, 50,normed=True, density=True)
unity_density = density / density.sum()
fig, ((ax1, ax2)) = plt.subplots(nrows=1, ncols=2, sharex=True, figsize=(12,6))
widths = bins[:-1] - bins[1:]
ax1.bar(bins[1:], unity_density, width=widths)
ax2.bar(bins[1:], unity_density.cumsum(), width=widths)
fig.tight_layout()
Then what I can do it visualize CDF in terms of points.
density1=unity_density.cumsum()
x=bins[:-1]
y=density1
plt.plot(x, density1, 'o')
So what I have been trying to do is to use the np.interp function on the output of np.histogram in order to obtain a smooth curve representing the CDF and extracting the percent points to plot them. Ideally, I need to try to do it all both manually and using ppf function from scipy. I have always struggled with statistics as an undergraduate. I am in grad school now and try to put me through as many exercises like this as possible in order to get a deeper understanding of what is happening. I've reached a point of desperation with this task. Thank you!
One possibility to get smoother results is to use more samples, by using 10^5 samples
and 100 bins
I get the following images:
ws = norm.rvs(loc=4.0, scale=1.5, size=100000)
density, bins = np.histogram(ws, bins=100, normed=True, density=True)
In general you could use scipys
interpolation module to smooth your CDF.
For 100 samples
and a smoothing factor of s=0.01
I get:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import splev, splrep
density1 = unity_density.cumsum()
x = bins[:-1]
y = density1
# Interpolation
spl = splrep(x, y, s=0.01, per=False)
x2 = np.linspace(x[0], x[-1], 200)
y2 = splev(x2, spl)
# Plotting
fig, ax = plt.subplots()
plt.plot(x, density1, 'o')
plt.plot(x2, y2, 'r-')
The third possibility is to calculate the CDF analytically. If you generate the noise yourself with a numpy
/ scipy
function most of the time there is already an implementation of the CDF
available, otherwise you should find it on Wikipedia. If your samples come from measurements that is of course a different story.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = np.linspace(-2, 10)
y = norm(loc=4.0, scale=1.5).cdf(x)
ax.plot(x, y, 'bo-')