Search code examples
pythonscikit-learnsparse-matrixknn

MemoryError while fitting a sparse matrix to kNN model


While running the below code I am having a MemoryError: from the last line.

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=7)
clf.fit(train_X, y_train)
y_pred_clf = clf.predict(test_X)

The test_X is a <10852x112 sparse matrix of type '<class 'numpy.float64'>' with 97668 stored elements in Compressed Sparse Row format>

Any suggestions?


Solution

  • One way is to use batches of the data and the second one is to use a different algorithm for the KNN model:

    clf = KNeighborsClassifier(n_neighbors=5,algorithm='kd_tree').fit(X_train, Y_train)
    y_pred_clf = clf.predict(test_X)
    

    The model by default is algorithm='brute' and brute false take too much memory.