Search code examples
pythonplotcurve-fittinggaussianleast-squares

Is this the correct way of fitting data generated from gaussian distributions in python?


I am trying to write a Python program to generate data that uses the sum of a random variable (which has a Gaussian distribution) and a 4th-degree polynomial equation (3x^4+x^3+3x^2+4x+5). Using least squares polynomial fit, curve the generated data using a model until your model can accurately predict all values. I am new to python and really trying to catch up with my fast-paced class. Any help and further explanation would be appreciated. I tried it without the for loop and it gave two curves but I thought it should match the initial points. Please see the code below:

import random
import matplotlib.pyplot as plt
import numpy as np

def rSquared(obs, predicted):
    error = ((predicted - obs)**2).sum()
    mean = error/len(obs)
    return 1 - (mean/np.var(obs))

def generateData(a, b, c, d, e, xvals):
    for x in xvals:
        calcVal= a*x**4 + b*x**3 + c*x**2 + d*x + e
        yvals.append(calcVal+ random.gauss(0, 35))
xvals = np.arange(-10, 11, 1)
yvals= []
a, b, c, d, e = 3, 1, 3, 4, 5
generateData(a, b, c, d, e, xvals)
for i in range (5):
    model= np.polyfit(xvals, yvals, i)
    estYvals = np.polyval(model, xvals) 
print('R-Squared:', rSquared(yvals, estYvals))
plt.plot(xvals, yvals, 'r', label = 'Actual values') 

plt.plot(xvals, estYvals, 'bo', label = 'Predicted values')
plt.xlabel('Variable x values')
plt.ylabel('Calculated Value of Polynomial')
plt.legend()
plt.show()

the result of this program run is in this image myplot1 without the for loop, this is what i get myPlot2


Solution

  • Yes, you just have to try with

    model = np.polyfit(xvals, yvals, i) # i=4 to get the perfect fit to values with R-Square of 4 : 0.9999995005089268.