Search code examples
pythonjupyter-notebookjupyterkaggle

NotFittedError - Titanic Project Kaggle


I am trying different machine learning projects from Kaggle to make myself better. Here is the model that I am using:

from sklearn.ensemble import RandomForestClassifier

y = train_data["Survived"]

features = ["Pclass", "Sex", "SibSp", "Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])

model = RandomForestClassifier(n_estimators = 100, max_depth = 5, random_state = 1)
model.fit = (X, y)
predictions = model.predict(X_test)

output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('submission.csv', index = False)
print('Your submission was successfully saved!')

Here is the error I get:

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
/tmp/ipykernel_33/1528591149.py in <module>
      9 forest_clf = RandomForestClassifier(n_estimators = 100, max_depth = 5, random_state = 1)
     10 forest_clf.fit = (X, y)
---> 11 predictions = forest_clf.predict(X_test)
     12 
     13 output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})

/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in predict(self, X)
    806             The predicted classes.
    807         """
--> 808         proba = self.predict_proba(X)
    809 
    810         if self.n_outputs_ == 1:

/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in predict_proba(self, X)
    846             classes corresponds to that in the attribute :term:`classes_`.
    847         """
--> 848         check_is_fitted(self)
    849         # Check data
    850         X = self._validate_X_predict(X)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
   1220 
   1221     if not fitted:
-> 1222         raise NotFittedError(msg % {"name": type(estimator).__name__})
   1223 
   1224 

NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

I think this is an example of the estimator cloning itself, but I am not sure which line is the issue here. This is the Titanic project that is seen on Kaggle, whose tutorial code I have copied amidst trying to learn. Any help is appreciated.


Solution

  • As @Blackgaurd pointed out just change model.fit = (X, y) to model.fit(X, y)

    Your current code overwrites the fit method of your Random Forest Classifier.

    Full code of yours with correction:

    from sklearn.ensemble import RandomForestClassifier
    
    y = train_data["Survived"]
    
    features = ["Pclass", "Sex", "SibSp", "Parch"]
    X = pd.get_dummies(train_data[features])
    X_test = pd.get_dummies(test_data[features])
    
    model = RandomForestClassifier(n_estimators = 100, max_depth = 5, random_state = 1)
    model.fit(X, y) # <- line of code fixed
    predictions = model.predict(X_test)
    
    output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
    output.to_csv('submission.csv', index = False)
    print('Your submission was successfully saved!')