python machine-learning curve-fitting knn

Python: KNN regression fitting returns error

I am working on a machine learning practice exercises and I keep getting an error when I run the following code:

from sklearn.neighbors import KNeighborsRegressor
import numpy as np
import matplotlib.pyplot as plt

N=51
SD=1.15
ME=0
E=np.random.normal(ME, SD, N)
X = np.linspace(-4,4, N, endpoint=True)
Y = X**2 + E

neigh = KNeighborsRegressor(n_neighbors=2)
neigh.fit(X, Y)

X_eval = np.linspace(0,4,1000)
X_eval = X_eval.reshape(-1,1)

plt.figure()
plt.plot(X_eval,neigh.predict(X_eval), label="regression predictor")
plt.plot(X,Y, 'rs', markersize=12, label="training set")
plt.show()

the error is on the neigh.fit() line and is:

ValueError: Expected 2D array, got 1D array instead: array=[all the generated x-values]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

But fitting it this way does not work either. I am fairly new to machine learning and python programming so my apologies if this question is trivial but: What could I improve to make my code run? Thanks in advance!

Solution

The crucial part to solving this is understanding the error. The error is telling you that you need to pass a 2-D array, but you only passed a 1-D array. To be more explicit, your problem is with X, which indeed needs to be reshaped as such:

X_new = X.reshape(-1,1)

reshape(-1,1) will take the 1-D array and make sure that each subarray only has 1 element in it. The -1 tells numpy to infer the number of subarrays to make; in this case, we get 51.

The code below runs in my terminal:

#UPDATED CODE
from sklearn.neighbors import KNeighborsRegressor
import numpy as np
import matplotlib.pyplot as plt

N=51
SD=1.15
ME=0
E=np.random.normal(ME, SD, N)
X = np.linspace(-4,4, N, endpoint=True)
X_NEW = X.reshape(-1,1)
Y = X**2 + E

neigh = KNeighborsRegressor(n_neighbors=2)
neigh.fit(X_NEW, Y)

X_eval = np.linspace(0,4,1000)
X_eval = X_eval.reshape(-1,1)

plt.figure()
plt.plot(X_eval,neigh.predict(X_eval), label="regression predictor")
plt.plot(X,Y, 'rs', markersize=12, label="training set")
plt.show()