Search code examples
pythonmachine-learningscikit-learnclassificationknn

'Multiclass-multioutput is not supported' Error in Scikit learn for Knn classifier


I have two variables X and Y.

The structure of X (i.e an np.array):

[[26777 24918 26821 ...    -1    -1    -1]
[26777 26831 26832 ...    -1    -1    -1]
[26777 24918 26821 ...    -1    -1    -1]
...
[26811 26832 26813 ...    -1    -1    -1]
[26830 26831 26832 ...    -1    -1    -1]
[26830 26831 26832 ...    -1    -1    -1]]

The structure of Y :

[[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [26764, 25803, 26781], [26764, 25803, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [1252, 26777, 16172], [1252, 26777, 16172]]

The array in Y , example [1252, 26777, 26831] are three separate features.

I am using Knn classifier from scikit learn module

classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X,Y)
predictions = classifier.predict(X)
print(accuracy_score(Y,predictions))

But I get an error saying :

ValueError: multiclass-multioutput is not supported

I guess the structure of 'Y' is not supported , what changes do I make in order for the program to execute?

Input :

  Deluxe Single room with sea view

Expected Output :

c_class = Deluxe
c_occ = single
c_view = sea

Solution

  • As mentioned in the error, KNN does not support multi-output regression/classification.

    For your problem, you need MultiOutputClassifier().

    from sklearn.multioutput import MultiOutputClassifier
    
    knn = KNeighborsClassifier(n_neighbors=3)
    classifier = MultiOutputClassifier(knn, n_jobs=-1)
    classifier.fit(X,Y)
    

    Working example:

    >>> from sklearn.feature_extraction.text import TfidfVectorizer
    >>> corpus = [
    ...     'This is the first document.',
    ...     'This document is the second document.',
    ...     'And this is the third one.',
    ...     'Is this the first document?',
    ... ]
    >>> vectorizer = TfidfVectorizer()
    >>> X = vectorizer.fit_transform(corpus)
    
    >>> Y = [[124323,1234132,1234],[124323,4132,14],[1,4132,1234],[1,4132,14]]
    
    >>> from sklearn.multioutput import MultiOutputClassifier
    >>> from sklearn.neighbors import KNeighborsClassifier
    >>> knn = KNeighborsClassifier(n_neighbors=3)
    >>> classifier = MultiOutputClassifier(knn, n_jobs=-1)
    >>> classifier.fit(X,Y)
    >>> predictions = classifier.predict(X)
    
    array([[124323,   4132,     14],
           [124323,   4132,     14],
           [     1,   4132,   1234],
           [124323,   4132,     14]])
    
    >>> classifier.score(X,np.array(Y))
    0.5
    
    >>> test_data = ['I want to test this']
    >>> classifier.predict(vectorizer.transform(test_data))
    array([[124323,   4132,     14]])