How to plot a similar graph in python?
import matplotlib.pylab as plt
import numpy as np
from scipy.stats import binom
y = binom.rvs(n = 10, p = 0.5, size = 100)
counts, bins = np.histogram(y, bins=50)
plt.scatter(bins[:len(counts)], counts)
plt.grid()
plt.show()
First off, when the data is discrete, the bin edges should go in between the values. Simply setting bins=50
chops the distance between the lowest and the highest value into 50 equally-sized regions. Some of these regions might get no values if their start and end both lie between the same integers.
To show the values in a scatter plot, you can use the centers of the bins as x-position, and the values 1, 2, ... till the count of the bin as the y position.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import binom
y = binom.rvs(n=10, p=0.5, size=100)
counts, bins = np.histogram(y, bins=np.arange(y.min() - 0.5, y.max() + 1, 1))
centers = (bins[:-1] + bins[1:]) / 2
for center, count in zip(centers, counts):
plt.scatter(np.repeat(center, count), np.arange(count) + 1, marker='o', edgecolor='blue', color='none')
plt.grid(axis='y')
plt.ylim(ymin=0)
plt.show()
Here is an example with a continuous distribution and using filled squares instead of hollow circles as markers:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import numpy as np
y = np.random.randn(150).cumsum()
counts, bins = np.histogram(y, bins=30)
centers = (bins[:-1] + bins[1:]) / 2
for center, count in zip(centers, counts):
plt.scatter(np.repeat(center, count), np.arange(count) + 1,
marker='s', color='crimson')
plt.rc('axes', axisbelow=True)
plt.grid(True, axis='y')
plt.gca().yaxis.set_major_locator(MultipleLocator(1))
plt.show()