Search code examples
pythonmachine-learningcurve-fittingknn

Python: KNN regression fitting returns error


I am working on a machine learning practice exercises and I keep getting an error when I run the following code:

from sklearn.neighbors import KNeighborsRegressor
import numpy as np
import matplotlib.pyplot as plt

N=51
SD=1.15
ME=0
E=np.random.normal(ME, SD, N)
X = np.linspace(-4,4, N, endpoint=True)
Y = X**2 + E

neigh = KNeighborsRegressor(n_neighbors=2)
neigh.fit(X, Y)

X_eval = np.linspace(0,4,1000)
X_eval = X_eval.reshape(-1,1)

plt.figure()
plt.plot(X_eval,neigh.predict(X_eval), label="regression predictor")
plt.plot(X,Y, 'rs', markersize=12, label="training set")
plt.show()

the error is on the neigh.fit() line and is:

ValueError: Expected 2D array, got 1D array instead: array=[all the generated x-values]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

But fitting it this way does not work either. I am fairly new to machine learning and python programming so my apologies if this question is trivial but: What could I improve to make my code run? Thanks in advance!


Solution

  • The crucial part to solving this is understanding the error. The error is telling you that you need to pass a 2-D array, but you only passed a 1-D array. To be more explicit, your problem is with X, which indeed needs to be reshaped as such:

    X_new = X.reshape(-1,1)
    

    reshape(-1,1) will take the 1-D array and make sure that each subarray only has 1 element in it. The -1 tells numpy to infer the number of subarrays to make; in this case, we get 51.

    The code below runs in my terminal:

    #UPDATED CODE
    from sklearn.neighbors import KNeighborsRegressor
    import numpy as np
    import matplotlib.pyplot as plt
    
    N=51
    SD=1.15
    ME=0
    E=np.random.normal(ME, SD, N)
    X = np.linspace(-4,4, N, endpoint=True)
    X_NEW = X.reshape(-1,1)
    Y = X**2 + E
    
    neigh = KNeighborsRegressor(n_neighbors=2)
    neigh.fit(X_NEW, Y)
    
    X_eval = np.linspace(0,4,1000)
    X_eval = X_eval.reshape(-1,1)
    
    plt.figure()
    plt.plot(X_eval,neigh.predict(X_eval), label="regression predictor")
    plt.plot(X,Y, 'rs', markersize=12, label="training set")
    plt.show()