Search code examples
pythonplothistogramregressionnormal-distribution

Plot a line graph over a histogram for residual plot in python


I have created a script to plot a histogram of a NO2 vs Temperature residuals in a dataframe called nighttime.

The histogram shows the normal distribution of the residuals from a regression line somewhere else in the python script.

I am struggling to find a way to plot a bell curve over the histogram like this example :

Plot Normal distribution with Matplotlib

How can I get a fitting normal distribution for my residual histogram?

plt.suptitle('NO2 and Temperature Residuals night-time', fontsize=20)

WSx_rm = nighttime['Temperature']                                        
WSx_rm = sm.add_constant(WSx_rm)   
NO2_WS_RM_mod = sm.OLS(nighttime.NO2, WSx_rm, missing = 'drop').fit() 
NO2_WS_RM_mod_sr = (NO2_WS_RM_mod.resid / np.std(NO2_WS_RM_mod.resid)) 
#Histogram of residuals
ax = plt.hist(NO2_WS_RM_mod.resid)
plt.xlim(-40,50)
plt.xlabel('Residuals')
plt.show

Solution

  • Does the following work for you? (using some adapted code from the link you gave)

    import scipy.stats as stats
    
    plt.suptitle('NO2 and Temperature Residuals night-time', fontsize=20)
    
    WSx_rm = nighttime['Temperature']                                        
    WSx_rm = sm.add_constant(WSx_rm)   
    NO2_WS_RM_mod = sm.OLS(nighttime.NO2, WSx_rm, missing = 'drop').fit() 
    NO2_WS_RM_mod_sr = (NO2_WS_RM_mod.resid / np.std(NO2_WS_RM_mod.resid)) 
    #Histogram of residuals
    ax = plt.hist(NO2_WS_RM_mod.resid)
    plt.xlim(-40,50)
    plt.xlabel('Residuals')
    
    # New Code: Draw fitted normal distribution
    residuals = sorted(NO2_WS_RM_mod.resid) # Just in case it isn't sorted
    normal_distribution = stats.norm.pdf(residuals, np.mean(residuals), np.std(residuals))
    plt.plot(residuals, normal_distribution)
    
    plt.show