Search code examples
pythonmatplotlibscatter-plot

How to plot a histogram as a scatter plot


How to plot a similar graph in python?

example

import matplotlib.pylab as plt
import numpy as np
from scipy.stats import binom

y = binom.rvs(n = 10, p = 0.5, size = 100)
counts, bins = np.histogram(y, bins=50)
plt.scatter(bins[:len(counts)], counts)
plt.grid()
plt.show()

Solution

  • First off, when the data is discrete, the bin edges should go in between the values. Simply setting bins=50 chops the distance between the lowest and the highest value into 50 equally-sized regions. Some of these regions might get no values if their start and end both lie between the same integers.

    To show the values in a scatter plot, you can use the centers of the bins as x-position, and the values 1, 2, ... till the count of the bin as the y position.

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.stats import binom
    
    y = binom.rvs(n=10, p=0.5, size=100)
    counts, bins = np.histogram(y, bins=np.arange(y.min() - 0.5, y.max() + 1, 1))
    centers = (bins[:-1] + bins[1:]) / 2
    for center, count in zip(centers, counts):
        plt.scatter(np.repeat(center, count), np.arange(count) + 1, marker='o', edgecolor='blue', color='none')
    plt.grid(axis='y')
    plt.ylim(ymin=0)
    plt.show()
    

    histogram as a scatter plot

    Here is an example with a continuous distribution and using filled squares instead of hollow circles as markers:

    import matplotlib.pyplot as plt
    from matplotlib.ticker import MultipleLocator
    import numpy as np
    
    y = np.random.randn(150).cumsum()
    counts, bins = np.histogram(y, bins=30)
    centers = (bins[:-1] + bins[1:]) / 2
    for center, count in zip(centers, counts):
        plt.scatter(np.repeat(center, count), np.arange(count) + 1,
                    marker='s', color='crimson')
    plt.rc('axes', axisbelow=True)
    plt.grid(True, axis='y')
    plt.gca().yaxis.set_major_locator(MultipleLocator(1))
    plt.show()
    

    histogram of a continuous distribution as a scatter plot