Search code examples
pythonnumpycurve-fittingpolynomialsdata-fitting

Not getting the right value with numpy polyfit


import numpy as np
from matplotlib import pyplot as plt
a = np.ndarray((2,8))

a[0] = [0,10,21.5,25.2,70,89,112,150] # row for all X values
a[1] = [0,5,10,15,20,25,30,35] # row for all Y values


#Value by curve fitting - 7th order polynomial            
trend = np.polyfit(a[0], a[1], 7)
trendpoly = np.poly1d(trend) #Values of the coefficients of 5th order polynomial
plt.plot(a[0],trendpoly(a[0]))
plt.plot(a[0],a[1])

Y4 = trendpoly(100)
plt.scatter(100,Y4)

print(Y4)

The results plot looks like this - a perfect fit with data. Two overlapping curves of the data and the curve fit, with a scatter point showing the value at X = 100 which is way off from the curve fit

Two overlapping curves of the data and the curve fit, with a scatter point showing the value at X = 100 which is way off from the curve fit

What's going wrong here? why is the value of trendpoly(100) not coherent with the curve fit?


Solution

  • This is the problem of overfitting.

    The more you set the order of the polynomial, more will be the overfitting. Try with an order of 3 or less to observe the change.

    enter image description here