Search code examples
pythonmatplotlibscipystatisticsnormal-distribution

Fitting dictionary into normal distribution curve


Here is the dictionary:

l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}

So this means that 31.2 has occurred 1 time, 35.1 has occurred 4 times and so on. I tried:

fig, ax = plt.subplots(1, 1)

ax.scatter(l.keys(), l.values)
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')

enter image description here

Also I found mean and std by

np.mean([k for k in l.keys()])
np.std([k for k in l.keys()])

Is this the way to find mean and std for that data. I doubt that because it does not take into account of number of occurences of each data. I want to see the normal curve on this data. Also is there a way to know how often a value occurs. For example if I extend the curve to touch 0 on x axis , and if I want to know how many data points are involved for an occurrence of 0(can also be probability).


Solution

  • Here is a way to draw a normal gauss curve to fit the data:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.stats as stats
    
    l = {31.2: 1, 35.1: 4, 39.0: 13, 42.9: 33, 46.8: 115, 50.7: 271, 54.6: 363, 58.5: 381, 62.4: 379, 66.3: 370, 70.2: 256, 74.1: 47, 78.0: 2}
    # convert the dictionary to a list
    l_list = [k for k, v in l.items() for _ in range(v)]
    
    fig, ax = plt.subplots(1, 1)
    
    ax.scatter(l.keys(), l.values())
    ax.set_xlabel('Key')
    ax.set_ylabel('Length of value')
    
    mu = np.mean(l_list)
    sigma = np.std(l_list)
    
    u = np.linspace(mu - 4 * sigma, mu + 4 * sigma, 100)
    ax2 = ax.twinx()
    ax2.plot(u, stats.norm.pdf(u, mu, sigma), color='crimson')
    ax2.set_ylabel('normal curve')
    
    plt.show()
    

    enter image description here