Search code examples
pythonsvc

SVC with linear kernel incorrect accuracy


The performance of the model does not increase during training epoch(s) where values are sorted by a specific row key. Dataset is balance and have 40,000 records with binary classification(0,1).

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

Linear_SVC_classifier = SVC(kernel='linear', random_state=1)#supervised learning
Linear_SVC_classifier.fit(x_train, y_train)
SVC_Accuracy = accuracy_score(y_test, SVC_Prediction)
print("\n\n\nLinear SVM Accuracy: ", SVC_Accuracy)

Solution

  • Add a count vectorizer to your train data and use logistic regression model

    from sklearn.model_selection import train_test_split
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import accuracy_score 
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) 
    
    cv = CountVectorizer() 
    ctmTr = cv.fit_transform(X_train) 
    X_test_dtm = cv.transform(X_test)
    
    model = LogisticRegression() 
    model.fit(ctmTr, y_train)
    
    y_pred_class = model.predict(X_test_dtm)
    
    SVC_Accuracy = accuracy_score(y_test)
    print("\n\n\nLinear SVM Accuracy: ", SVC_Accuracy)
    

    the above model definition is something 'equivalent' to this statement

    Linear_SVC_classifier = SVC(kernel='linear', random_state=1)  
    Linear_SVC_classifier.fit(ctmTr, y_train)