Search code examples
pythonmachine-learningscikit-learndata-sciencecross-validation

How to change from normal machine learning technique to cross validation?


from sklearn.svm import LinearSVC

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.metrics import accuracy_score

X = data['Review']

y = data['Category']

tfidf = TfidfVectorizer(ngram_range=(1,1))

classifier = LinearSVC()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

clf =  Pipeline([
    ('tfidf', tfidf),
    ('clf', classifier)
])

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))


accuracy_score(y_test, y_pred)

This is the code to train a model and prediction. I need to know my model performance. so where should I change to become cross_val_score?


Solution

  • use this:(it is an example from my previous project)

    import numpy as np
    from sklearn.model_selection import KFold, cross_val_score
    
    kfolds = KFold(n_splits=5, shuffle=True, random_state=42)
    def cv_f1(model, X, y):
      score = np.mean(cross_val_score(model, X, y,
                                    scoring="f1",
                                    cv=kfolds))
      return (score)
    
    
    model = ....
    
    score_f1 = cv_f1(model, X_train, y_train)
    

    you can have multiple scoring. you should just change scoring="f1". if you want to see score for each fold just remove np.mean