Search code examples
pythonknn

Getting error "ValueError: Found input variables with inconsistent numbers of samples: [35, 311]" in ML's knn algorithm with python


I'm trying to practice this code for ML, but I'm facing some error saying "ValueError: Found input variables with inconsistent numbers of samples: [70, 276]"

code is following:

    X = Feature
    X[0:5]
    y = df['loan_status'].values
    y[0:5]
    X= preprocessing.StandardScaler().fit(X).transform(X)
    X[0:5]
    from sklearn.metrics import jaccard_similarity_score, log_loss, 
    f1_score
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier



    #Split data....also face error here
    Xtrn, Xtst, ytrn, ytst = train_test_split(X, y, test_size=0.2, 
    random_state=6)
    #Lets k=6
    k = 6
    knn= KNeighborsClassifier(n_neighbors = k).fit(Xtrn,ytrn)
    knn
    y_pred = knn.predict(Xtrn)
    y_pred[0:5]


    #-----face error here
    print("Jaccard Score in train set= ", jaccard_similarity_score(ytrn, 
    knn.predict(Xtrn)))
    print("F1 Score in train set= ", f1_score(ytrn, knn.predict(Xtrn), 
    average='weighted'))
    print("F1 Score in test set=  ", f1_score(ytst, y_pred, 
    average='weighted'))
    print("Jaccard Score in test set= ", jaccard_similarity_score(ytst, 
    y_pred))

Solution

  • X and Y are not in proper shapes for train_test_split. Try checking their shapes.

    Use shape method. Eg - X.shape

    Then use reshape method to get both shape aligned.