Search code examples
pythonscikit-learnpipelinedecision-tree

Not able to plot tree from pipeline


I have this below code for a decision tree classification, I am able to see results of predictions for this model but not able to draw the tree

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_selector as selector
from sklearn.tree import plot_tree
from sklearn.tree import DecisionTreeClassifier

# Scale numeric values
num_piepline = Pipeline([("imputer", SimpleImputer(missing_values=np.nan,
                                          strategy="median",
                                          )),
                           ('scalar1',StandardScaler()),
                           
                      ])

# One-hot encode categorical values
cat_pipeline = Pipeline([('onehot', OneHotEncoder(handle_unknown='ignore'))])

full_pipeline = ColumnTransformer(
    transformers=[
        ('num', num_piepline, ['a', 'b', 'c', 'd']),
        ('cat', cat_pipeline, ['e'])
        
    ])

decisiontree_entropy_model = Pipeline(steps=[
    ('dt_preprocessor', full_pipeline),
    ('dt_classifier', DecisionTreeClassifier(random_state=2021, max_depth=3, criterion='entropy'))])

decisiontree_entropy_model.fit(X_train, y_train)

dte_y_pred = decisiontree_entropy_model.predict(X_train)

fig = plt.figure(figsize=(25,20))
plot_tree(decisiontree_entropy_model_clf)

I get below error stack trace.

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
<ipython-input-151-da85340c2477> in <module>
      1 from sklearn.tree import plot_tree
      2 fig = plt.figure(figsize=(25,20))
----> 3 plot_tree(decisiontree_entropy_model_clf)
      4 
      5 # from IPython.display import Image

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

~\Anaconda3\lib\site-packages\sklearn\tree\_export.py in plot_tree(decision_tree, max_depth, feature_names, class_names, label, filled, impurity, node_ids, proportion, rotate, rounded, precision, ax, fontsize)
    178     """
    179 
--> 180     check_is_fitted(decision_tree)
    181 
    182     if rotate != 'deprecated':

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
   1017 
   1018     if not attrs:
-> 1019         raise NotFittedError(msg % {'name': type(estimator).__name__})
   1020 
   1021 

NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Here I ran fit on the model an i can see classification_report on the model, but printing the same says not fitted error. does the pipeline istance doesn't exist after we call fit once? Not sure why it is only failing at plotting a tree when it actually worked for deriving the performance metrics from classification report


Solution

  • There is nothing named decisiontree_entropy_model_clf in your code; to plot the decision tree from the pipeline, you should use

    plot_tree(decisiontree_entropy_model['dt_classifier'])
    

    after the pipeline has been fitted (the tree does not even exist before fitting).

    For accessing various attributes of a pipeline in general, see Getting model attributes from pipeline.