Search code examples
pythonmachine-learningscikit-learnrandom-forestsklearn-pandas

Number of features of the model must match the input. Model n_features is 20 and input n_features is 4


enter image description hereI am getting this error while I'm using random forest classifier. Here is my code:

import quandl, math    
import numpy as np    
import pandas as pd    
import matplotlib.pyplot as plt    
from matplotlib import style   
import datetime    
from sklearn.ensemble import RandomForestClassifier    
from sklearn.preprocessing import LabelEncoder    
from sklearn.feature_extraction.text import CountVectorizer

train = pd.read_csv("train.csv", index_col=None)    
vectorizer = CountVectorizer(min_df=1)    
X1 = vectorizer.fit_transform(train['question'])    
X=X1.toarray()    
corpus=['tell me your name']    
t1= vectorizer.fit_transform(corpus)    
t=t1.toarray()    
number=LabelEncoder()   
train['answer']=number.fit_transform(train['answer'].astype('str'))    
features = ['question','answer']    
y= train['question'].values    
clf=RandomForestClassifier(n_estimators=20)    
clf.fit(X,y)    
predicted_result=clf.predict(t)

Solution

  • Use the same trained vectorizer for both train and test data. In the second time if you again fit the data then it will turn it into a vector based on only this new data.

    X1 = vectorizer.fit_transform(train['question'])
    t1= vectorizer.transform(corpus)