Search code examples
pythonmachine-learningscikit-learnsvmtfidfvectorizer

TfidfVectorizer - Vocabulary wasn't fitted


Code to import a model and run a prediction on a single input

Tfidf_vect = TfidfVectorizer(max_features=5000) # Same classifier as I used in the model
Train_X_IP = Tfidf_vect.transform(["change in the meaning"]).toarray() #Passing the input 
loaded_model = pickle.load(open("finalized_model.sav", 'rb')) #loading the model

predictions_SVM = loaded_model.predict_proba(Train_X_IP)
print(predictions_SVM)


Error I get : TfidfVectorizer - Vocabulary wasn't fitted.

I saw many articles suggesting many approaches , so far I tried

Adding fit_transform instead of Tfidf_vect.transform but that did not solve the issue

Second Option I tried , loading Tfidfvectorizer externally

 Tfidf_vect = TfidfVectorizer(max_features=5000)
 import pickle
 pickle.dump(Tfidf_vect, open("vectorizer.pickle", "wb"))  
 multilabel_binarizer = joblib.load('vectorizer.pickle')
 Still get the same error : TfidfVectorizer - Vocabulary wasn't fitted.

Is this a correct way to use the model and vectorizer. ?


Solution

  • You need the same vectorizer that you used to train the model in the first place. I'm assuming you would've used a "fit" or "fit_transform" function during training. Once you do that, save the vectorizer in pickle of joblib format. Then load it back up, and use the transform function on the new data to make predictions.