import numpy as np
from matplotlib import pyplot as plt
a = np.ndarray((2,8))
a[0] = [0,10,21.5,25.2,70,89,112,150] # row for all X values
a[1] = [0,5,10,15,20,25,30,35] # row for all Y values
#Value by curve fitting - 7th order polynomial
trend = np.polyfit(a[0], a[1], 7)
trendpoly = np.poly1d(trend) #Values of the coefficients of 5th order polynomial
plt.plot(a[0],trendpoly(a[0]))
plt.plot(a[0],a[1])
Y4 = trendpoly(100)
plt.scatter(100,Y4)
print(Y4)
The results plot looks like this - a perfect fit with data. Two overlapping curves of the data and the curve fit, with a scatter point showing the value at X = 100 which is way off from the curve fit
What's going wrong here? why is the value of trendpoly(100) not coherent with the curve fit?
This is the problem of overfitting.
The more you set the order of the polynomial, more will be the overfitting. Try with an order of 3 or less to observe the change.