I have this below code for a decision tree classification, I am able to see results of predictions for this model but not able to draw the tree
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_selector as selector
from sklearn.tree import plot_tree
from sklearn.tree import DecisionTreeClassifier
# Scale numeric values
num_piepline = Pipeline([("imputer", SimpleImputer(missing_values=np.nan,
strategy="median",
)),
('scalar1',StandardScaler()),
])
# One-hot encode categorical values
cat_pipeline = Pipeline([('onehot', OneHotEncoder(handle_unknown='ignore'))])
full_pipeline = ColumnTransformer(
transformers=[
('num', num_piepline, ['a', 'b', 'c', 'd']),
('cat', cat_pipeline, ['e'])
])
decisiontree_entropy_model = Pipeline(steps=[
('dt_preprocessor', full_pipeline),
('dt_classifier', DecisionTreeClassifier(random_state=2021, max_depth=3, criterion='entropy'))])
decisiontree_entropy_model.fit(X_train, y_train)
dte_y_pred = decisiontree_entropy_model.predict(X_train)
fig = plt.figure(figsize=(25,20))
plot_tree(decisiontree_entropy_model_clf)
I get below error stack trace.
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
<ipython-input-151-da85340c2477> in <module>
1 from sklearn.tree import plot_tree
2 fig = plt.figure(figsize=(25,20))
----> 3 plot_tree(decisiontree_entropy_model_clf)
4
5 # from IPython.display import Image
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~\Anaconda3\lib\site-packages\sklearn\tree\_export.py in plot_tree(decision_tree, max_depth, feature_names, class_names, label, filled, impurity, node_ids, proportion, rotate, rounded, precision, ax, fontsize)
178 """
179
--> 180 check_is_fitted(decision_tree)
181
182 if rotate != 'deprecated':
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1017
1018 if not attrs:
-> 1019 raise NotFittedError(msg % {'name': type(estimator).__name__})
1020
1021
NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Here I ran fit on the model an i can see classification_report on the model, but printing the same says not fitted error. does the pipeline istance doesn't exist after we call fit once? Not sure why it is only failing at plotting a tree when it actually worked for deriving the performance metrics from classification report
There is nothing named decisiontree_entropy_model_clf
in your code; to plot the decision tree from the pipeline, you should use
plot_tree(decisiontree_entropy_model['dt_classifier'])
after the pipeline has been fitted (the tree does not even exist before fitting).
For accessing various attributes of a pipeline in general, see Getting model attributes from pipeline.