Search code examples

Not getting the right value with numpy polyfit

import numpy as np
from matplotlib import pyplot as plt
a = np.ndarray((2,8))

a[0] = [0,10,21.5,25.2,70,89,112,150] # row for all X values
a[1] = [0,5,10,15,20,25,30,35] # row for all Y values

#Value by curve fitting - 7th order polynomial            
trend = np.polyfit(a[0], a[1], 7)
trendpoly = np.poly1d(trend) #Values of the coefficients of 5th order polynomial

Y4 = trendpoly(100)


The results plot looks like this - a perfect fit with data. Two overlapping curves of the data and the curve fit, with a scatter point showing the value at X = 100 which is way off from the curve fit

Two overlapping curves of the data and the curve fit, with a scatter point showing the value at X = 100 which is way off from the curve fit

What's going wrong here? why is the value of trendpoly(100) not coherent with the curve fit?


  • This is the problem of overfitting.

    The more you set the order of the polynomial, more will be the overfitting. Try with an order of 3 or less to observe the change.

    enter image description here