Trying to interpolate the output of a histogram function in Python

What I am trying to do is to play around with some random distribution. I don't want it to be normal. But for the time being normal is easier.

import matplotlib.pyplot as plt
from scipy.stats import norm

ws=norm.rvs(4.0, 1.5, size=100)
density, bins = np.histogram(ws, 50,normed=True, density=True)
unity_density = density / density.sum()

fig, ((ax1, ax2)) = plt.subplots(nrows=1, ncols=2, sharex=True, figsize=(12,6))
widths = bins[:-1] - bins[1:]

ax1.bar(bins[1:], unity_density, width=widths)
ax2.bar(bins[1:], unity_density.cumsum(), width=widths)

fig.tight_layout()

enter image description here Then what I can do it visualize CDF in terms of points.

density1=unity_density.cumsum()
x=bins[:-1]
y=density1

plt.plot(x, density1, 'o')

enter image description here

So what I have been trying to do is to use the np.interp function on the output of np.histogram in order to obtain a smooth curve representing the CDF and extracting the percent points to plot them. Ideally, I need to try to do it all both manually and using ppf function from scipy. I have always struggled with statistics as an undergraduate. I am in grad school now and try to put me through as many exercises like this as possible in order to get a deeper understanding of what is happening. I've reached a point of desperation with this task. Thank you!

Solution

One possibility to get smoother results is to use more samples, by using 10^5 samples and 100 bins I get the following images:

ws = norm.rvs(loc=4.0, scale=1.5, size=100000)
density, bins = np.histogram(ws, bins=100, normed=True, density=True)

histogram

In general you could use scipys interpolation module to smooth your CDF. For 100 samples and a smoothing factor of s=0.01 I get:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import splev, splrep

density1 = unity_density.cumsum()
x = bins[:-1]
y = density1

# Interpolation
spl = splrep(x, y, s=0.01, per=False)
x2 = np.linspace(x[0], x[-1], 200)
y2 = splev(x2, spl)

# Plotting
fig, ax = plt.subplots()
plt.plot(x, density1, 'o')
plt.plot(x2, y2, 'r-')

The third possibility is to calculate the CDF analytically. If you generate the noise yourself with a numpy / scipy function most of the time there is already an implementation of the CDF available, otherwise you should find it on Wikipedia. If your samples come from measurements that is of course a different story.

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
x = np.linspace(-2, 10)
y = norm(loc=4.0, scale=1.5).cdf(x)
ax.plot(x, y, 'bo-')