Search code examples
pythonmachine-learningclassificationmulticlass-classification

Creating and tesing a classifier


I have a two columns in an excel file. Row 1 has the exact user input, and row 2 has its cause. e.g.

ROW 1                                     ROW 2
money deducted                            cause 1
delivery is late                          cause 2
something here                            cause 48
payment problem                           cause 1
.                                         .
.                                         .

The task is to implement a classifier that next time when a particular user input is given it can classify as one of the causes i.e. make the classifier learn of these cases and predict future values.

I have some knowledge about classification, but I just really want an idea how can I implement this using a one vs rest classifier.


Solution

  • That is how you may implement this classifier using scikit-learn. Pass all training sentences to X_train and corresponding labels according to index of target_names.

    X_train = np.array(["money deducted",
                        "delivery is late",
                        "something here",
                        "payment problem"])
    y_labels = [(1, ), (2, ), (3, ), (1, )]
    y_train = MultiLabelBinarizer().fit_transform(y_labels)
    target_names = ['cause1', 'cause2', 'cause48']
    classifier = Pipeline([
        ('vectorizer', CountVectorizer()),
        ('tfidf', TfidfTransformer()),
        ('clf', OneVsRestClassifier(LinearSVC()))])
    classifier.fit(X_train, y_train)
    

    That is all to train a classifier, then you may predict easily whatever you want. For more reference: http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

    Then Fit and transform y_lables to Binarizer:

    mlb.fit_transform(y_labels)
    

    Then predict as following:

    mlb.inverse_transform(classifier.predict(X_test))
    

    This will give you class labels and then you may pass it as index to target_names.

    Hope it helps!