python machine-learning artificial-intelligence data-science

why we are using plt.plot(x, lin_reg2.predict(poly_reg.fit_transform(x)) ) instead of using plt.plot(x_poly, lin_reg2.predict(x_poly) )

Here is the code :-

from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt

poly_reg = PolynomialFeatures(degree = 4)      
x_poly = poly_reg.fit_transform(x)

lin_reg2 = LinearRegression()
lin_reg2.fit(x_poly, y)

plt.title("Polynomial Regression")
plt.xlabel("Position Level")
plt.ylabel("Salary")
plt.scatter(x, y, color ='red')
plt.plot(x, lin_reg2.predict(poly_reg.fit_transform(x)), color = 'blue')
plt.show()

why we are using this following code :-

plt.plot(x, lin_reg2.predict(poly_reg.fit_transform(x)), color = 'blue')

Instead of using :-

plt.plot(x_poly, lin_reg2.predic(x_poly), color = "blue")

Solution

In this line of code:

plt.plot(x, lin_reg2.predict(poly_reg.fit_transform(x)), color = 'blue')

1) The reason we use x instead of x_poly, is while plotting your regression data, you would like to view your original x values on the graph (not your centred values - which are obtained post the fit_transform() method) - because you'd like to see the prediction for original values. Seeing the prediction for a centred value will confuse you and won't hold much meaning unless you convert it back to the original value.

2) While predicting the y value, the input to the model needs to be the transformed x values. This is why we first transform x, and then use it to predict y. I think using predict(x_poly) here would achieve the same goal too.

Hope this helped!